Alert

์ด ๊ธ€์€ Claude Code์˜ ๋„์›€์„ ๋ฐ›์•„ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค

TL;DR

  • ์ •๊ทœ ํ‘œํ˜„์‹(regex)์€ ๋ฌธ์ž์—ด์—์„œ ํŒจํ„ด์„ ์ฐพ๊ธฐ ์œ„ํ•œ ๋ฏธ๋‹ˆ ์–ธ์–ด๋‹ค
  • Python์—์„œ๋Š” re ๋ชจ๋“ˆ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ, ํŒจํ„ด ๋ฌธ๋ฒ•์„ ์ดํ•ดํ•˜๋ฉด ๋กœ๊ทธ ํŒŒ์‹ฑ, ์ž…๋ ฅ ๊ฒ€์ฆ, ํ…์ŠคํŠธ ์ถ”์ถœ ๋“ฑ์— ๋ฐ”๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค
  • ์ด ๊ธ€์€ ํŒจํ„ด ๋ฌธ๋ฒ•๋ถ€ํ„ฐ ๋‹จ๊ณ„๋ณ„๋กœ ๋”ฐ๋ผ๊ฐ€๋ฉฐ ์ตํžˆ๋Š” ๊ตฌ์„ฑ์ด๋‹ค

Sources


1. ์ •๊ทœ ํ‘œํ˜„์‹์ด๋ž€

์ •๊ทœ ํ‘œํ˜„์‹(Regular Expression, regex)์€ ๋ฌธ์ž์—ด์—์„œ ํŠน์ • ํŒจํ„ด์„ ์ฐพ๊ธฐ ์œ„ํ•œ ํ‘œํ˜„ ์ฒด๊ณ„๋‹ค. ๋ฌธ์ž์—ด ์•ˆ์—์„œ โ€œ์ด๋Ÿฐ ๋ชจ์–‘์˜ ํ…์ŠคํŠธ๋ฅผ ์ฐพ์•„์ค˜โ€๋ผ๊ณ  ๋งํ•˜๋Š” ๋ฏธ๋‹ˆ ์–ธ์–ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

Python์—์„œ๋Š” ํ‘œ์ค€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ธ re ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•œ๋‹ค.

import re
 
# "์ˆซ์ž๊ฐ€ ์—ฐ์†๋œ ๋ถ€๋ถ„"์„ ์ฐพ์•„์ค˜
re.findall(r'\d+', '์ฃผ๋ฌธ๋ฒˆํ˜ธ 12345, ์ˆ˜๋Ÿ‰ 3๊ฐœ')
# ['12345', '3']

์ด ํ•œ ์ค„์„ ์ดํ•ดํ•˜๋ ค๋ฉด \d+๊ฐ€ ๋ญ”์ง€, findall์ด ๋ญ”์ง€ ์•Œ์•„์•ผ ํ•œ๋‹ค. ๋จผ์ € ํŒจํ„ด ๋ฌธ๋ฒ•๋ถ€ํ„ฐ ํ•˜๋‚˜์”ฉ ๋ฐฐ์›Œ๋ณด์ž.

raw string (r'...')

regex ํŒจํ„ด์€ ํ•ญ์ƒ r'...' ํ˜•ํƒœ๋กœ ์ž‘์„ฑํ•œ๋‹ค. r์„ ๋ถ™์ด๋ฉด Python์ด ๋ฐฑ์Šฌ๋ž˜์‹œ๋ฅผ ์ด์Šค์ผ€์ดํ”„ ์ฒ˜๋ฆฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— \d, \b ๊ฐ™์€ regex ๋ฌธ๋ฒ•์„ ๊ทธ๋Œ€๋กœ ์“ธ ์ˆ˜ ์žˆ๋‹ค.

# โŒ r ์—†์ด ์“ฐ๋ฉด \b๊ฐ€ ๋ฐฑ์ŠคํŽ˜์ด์Šค(\x08)๋กœ ํ•ด์„๋จ
re.search('\bword\b', 'a word here')     # None
 
# โœ… raw string
re.search(r'\bword\b', 'a word here')    # Match

2. ํŒจํ„ด ๊ธฐ์ดˆ โ€” ๊ธ€์ž ํ•˜๋‚˜ ๋งค์นญํ•˜๊ธฐ

regex์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์€ ๊ธ€์ž ํ•˜๋‚˜๋ฅผ ๋งค์นญํ•˜๋Š” ๊ทœ์น™์ด๋‹ค.

๋ฆฌํ„ฐ๋Ÿด ๋งค์นญ

์ผ๋ฐ˜ ๋ฌธ์ž๋Š” ๊ทธ ์ž์ฒด๋ฅผ ๋งค์นญํ•œ๋‹ค.

re.findall(r'a', 'banana')
# ['a', 'a', 'a']
 
re.findall(r'hello', 'say hello to hello world')
# ['hello', 'hello']
๋ฉ”ํƒ€ ๋ฌธ์ž

regex์—์„œ ํŠน๋ณ„ํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋ฌธ์ž๋“ค์ด ์žˆ๋‹ค. ์ด๊ฒƒ๋“ค์„ ๋ฉ”ํƒ€ ๋ฌธ์ž๋ผ ํ•œ๋‹ค.

. ^ $ * + ? { } [ ] \ | ( )

์ด ๋ฌธ์ž๋ฅผ ๊ธ€์ž ๊ทธ๋Œ€๋กœ ๋งค์นญํ•˜๋ ค๋ฉด ์•ž์— \๋ฅผ ๋ถ™์ธ๋‹ค.

re.findall(r'\.', 'version 3.11.0')    # ['.', '.']
re.findall(r'\$', 'price: $100')       # ['$']
re.findall(r'\(', 'func(x)')           # ['(']
. (์ ) โ€” ์•„๋ฌด ๊ธ€์ž ํ•˜๋‚˜

.์€ ์ค„๋ฐ”๊ฟˆ(\n)์„ ์ œ์™ธํ•œ ์•„๋ฌด ๊ธ€์ž ํ•˜๋‚˜๋ฅผ ๋งค์นญํ•œ๋‹ค.

re.findall(r'a.c', 'abc adc a1c a c aXXc')
# ['abc', 'adc', 'a1c', 'a c']
# 'aXXc'๋Š” a์™€ c ์‚ฌ์ด์— ๊ธ€์ž๊ฐ€ 2๊ฐœ๋ผ ๋งค์นญ ์•ˆ ๋จ
๋ฌธ์ž ํด๋ž˜์Šค [...] โ€” ์ด ์ค‘์— ํ•˜๋‚˜

๋Œ€๊ด„ํ˜ธ ์•ˆ์— ๋‚˜์—ดํ•œ ๋ฌธ์ž ์ค‘ ํ•˜๋‚˜๋ฅผ ๋งค์นญํ•œ๋‹ค.

re.findall(r'[aeiou]', 'hello world')
# ['e', 'o', 'o']
 
# ๋ฒ”์œ„ ์ง€์ •
re.findall(r'[a-z]', 'Hello 123')
# ['e', 'l', 'l', 'o']
 
re.findall(r'[0-9]', 'abc 123 def')
# ['1', '2', '3']
 
# ์—ฌ๋Ÿฌ ๋ฒ”์œ„ ์กฐํ•ฉ
re.findall(r'[a-zA-Z0-9]', 'Hi! 3?')
# ['H', 'i', '3']
[^...] โ€” ์ด๊ฒƒ ๋นผ๊ณ  ์ „๋ถ€

^๋ฅผ ๋Œ€๊ด„ํ˜ธ ์•ˆ ๋งจ ์•ž์— ์“ฐ๋ฉด ๋‚˜์—ดํ•œ ๋ฌธ์ž๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€๋ฅผ ๋งค์นญํ•œ๋‹ค.

re.findall(r'[^0-9]', 'abc 123')
# ['a', 'b', 'c', ' ']
 
re.findall(r'[^aeiou ]', 'hello world')
# ['h', 'l', 'l', 'w', 'r', 'l', 'd']
๋ฏธ๋ฆฌ ์ •์˜๋œ ๋ฌธ์ž ํด๋ž˜์Šค

์ž์ฃผ ์“ฐ๋Š” ํŒจํ„ด์€ ์ถ•์•ฝํ˜•์ด ์žˆ๋‹ค.

์ถ•์•ฝํ˜•์˜๋ฏธ๋™๋“ฑํ•œ ํ‘œํ˜„
\d์ˆซ์ž[0-9]
\D์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๊ฒƒ[^0-9]
\w๋‹จ์–ด ๋ฌธ์ž (๊ธ€์ž, ์ˆซ์ž, _)[a-zA-Z0-9_]
\W๋‹จ์–ด ๋ฌธ์ž๊ฐ€ ์•„๋‹Œ ๊ฒƒ[^a-zA-Z0-9_]
\s๊ณต๋ฐฑ (์ŠคํŽ˜์ด์Šค, ํƒญ, ์ค„๋ฐ”๊ฟˆ)[ \t\n\r\f\v]
\S๊ณต๋ฐฑ์ด ์•„๋‹Œ ๊ฒƒ[^ \t\n\r\f\v]
re.findall(r'\d', 'abc 123')      # ['1', '2', '3']
re.findall(r'\w', 'hi! 3?')       # ['h', 'i', '3']
re.findall(r'\s', 'a b\tc\nd')    # [' ', '\t', '\n']

๋Œ€๋ฌธ์ž๋Š” ์†Œ๋ฌธ์ž์˜ ๋ฐ˜๋Œ€๋ผ๊ณ  ๊ธฐ์–ตํ•˜๋ฉด ๋œ๋‹ค. \dโ†”\D, \wโ†”\W, \sโ†”\S.

\w์™€ ํ•œ๊ธ€

Python 3์—์„œ \w๋Š” ์œ ๋‹ˆ์ฝ”๋“œ ๋‹จ์–ด ๋ฌธ์ž๋ฅผ ํฌํ•จํ•˜๋ฏ€๋กœ ํ•œ๊ธ€๋„ ๋งค์นญ๋œ๋‹ค.

re.findall(r'\w', '์•ˆ๋…• hello ์„ธ๊ณ„')
# ['์•ˆ', '๋…•', 'h', 'e', 'l', 'l', 'o', '์„ธ', '๊ณ„']

ASCII๋งŒ ๋งค์นญํ•˜๊ณ  ์‹ถ์œผ๋ฉด re.ASCII ํ”Œ๋ž˜๊ทธ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. (ํ”Œ๋ž˜๊ทธ๋Š” ๋’ค์—์„œ ๋‹ค๋ฃฌ๋‹ค)


3. ํŒจํ„ด ํ™•์žฅ โ€” ๋ฐ˜๋ณต (์ˆ˜๋Ÿ‰์ž)

์ง€๊ธˆ๊นŒ์ง€๋Š” ๊ธ€์ž ํ•˜๋‚˜๋ฅผ ๋งค์นญํ–ˆ๋Š”๋ฐ, ์ˆ˜๋Ÿ‰์ž(Quantifier)๋ฅผ ํ†ตํ•ด ๋ช‡ ๋ฒˆ ๋ฐ˜๋ณต๋˜๋Š”์ง€๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค

* โ€” 0๋ฒˆ ์ด์ƒ
re.findall(r'ab*c', 'ac abc abbc abbbc')
# ['ac', 'abc', 'abbc', 'abbbc']
# b๊ฐ€ 0๋ฒˆ(ac), 1๋ฒˆ(abc), 2๋ฒˆ(abbc), 3๋ฒˆ(abbbc) ๋ชจ๋‘ ๋งค์นญ
+ โ€” 1๋ฒˆ ์ด์ƒ
re.findall(r'ab+c', 'ac abc abbc abbbc')
# ['abc', 'abbc', 'abbbc']
# b๊ฐ€ 0๋ฒˆ์ธ 'ac'๋Š” ๋งค์นญ ์•ˆ ๋จ
? โ€” 0๋ฒˆ ๋˜๋Š” 1๋ฒˆ
re.findall(r'colou?r', 'color colour')
# ['color', 'colour']
# u๊ฐ€ ์žˆ์–ด๋„ ๋˜๊ณ  ์—†์–ด๋„ ๋จ
 
re.findall(r'https?://', 'http://a https://b')
# ['http://', 'https://']
{m} โ€” ์ •ํ™•ํžˆ m๋ฒˆ
re.findall(r'\d{3}', '1 12 123 1234')
# ['123', '123']
# 1234์—์„œ ์•ž 3์ž๋ฆฌ '123'์ด ๋งค์นญ๋จ
{m,n} โ€” m๋ฒˆ ์ด์ƒ n๋ฒˆ ์ดํ•˜
re.findall(r'\d{2,4}', '1 12 123 1234 12345')
# ['12', '123', '1234', '1234']
์ˆ˜๋Ÿ‰์ž์™€ ๋ฌธ์ž ํด๋ž˜์Šค ์กฐํ•ฉ

์—ฌ๊ธฐ์„œ๋ถ€ํ„ฐ regex๊ฐ€ ๊ฐ•๋ ฅํ•ด์ง„๋‹ค. ์ง€๊ธˆ๊นŒ์ง€ ๋ฐฐ์šด ๊ฒƒ๋“ค์„ ์กฐํ•ฉํ•ด๋ณด์ž.

# \d+ : ์ˆซ์ž๊ฐ€ 1๋ฒˆ ์ด์ƒ ์—ฐ์†
re.findall(r'\d+', '์ฃผ๋ฌธ๋ฒˆํ˜ธ 12345, ์ˆ˜๋Ÿ‰ 3๊ฐœ')
# ['12345', '3']
 
# \w+ : ๋‹จ์–ด ๋ฌธ์ž๊ฐ€ 1๋ฒˆ ์ด์ƒ ์—ฐ์†
re.findall(r'\w+', 'hello world 123')
# ['hello', 'world', '123']
 
# [a-z]+ : ์†Œ๋ฌธ์ž๊ฐ€ 1๋ฒˆ ์ด์ƒ ์—ฐ์†
re.findall(r'[a-z]+', 'Hello World 123')
# ['ello', 'orld']
 
# [A-Za-z]+ : ์˜๋ฌธ์ž๊ฐ€ 1๋ฒˆ ์ด์ƒ ์—ฐ์†
re.findall(r'[A-Za-z]+', 'Hello World 123')
# ['Hello', 'World']
Greedy vs Lazy

์ˆ˜๋Ÿ‰์ž๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Greedy(ํƒ์š•์ )ํ•˜๋‹ค. ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์„ ์—์„œ ์ตœ๋Œ€ํ•œ ๊ธด ๋ฌธ์ž์—ด์„ ์žก๋Š”๋‹ค.

html = '<b>bold</b> and <i>italic</i>'
 
re.findall(r'<.*>', html)
# ['<b>bold</b> and <i>italic</i>']
# .* ๊ฐ€ ๊ฐ€๋Šฅํ•œ ํ•œ ๊ธธ๊ฒŒ ๋จน์–ด์„œ, ์ฒซ < ๋ถ€ํ„ฐ ๋งˆ์ง€๋ง‰ > ๊นŒ์ง€ ์ „๋ถ€ ํ•˜๋‚˜๋กœ ์žกํž˜

์ˆ˜๋Ÿ‰์ž ๋’ค์— ?๋ฅผ ๋ถ™์ด๋ฉด Lazy(๊ฒŒ์œผ๋ฅธ)๊ฐ€ ๋œ๋‹ค. ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ์„ ์—์„œ ์ตœ๋Œ€ํ•œ ์งง์€ ๋ฌธ์ž์—ด์„ ์žก๋Š”๋‹ค.

re.findall(r'<.*?>', html)
# ['<b>', '</b>', '<i>', '</i>']
# .*? ๊ฐ€ ๊ฐ€๋Šฅํ•œ ํ•œ ์งง๊ฒŒ ๋จน์–ด์„œ, ๊ฐ < > ์Œ์„ ํ•˜๋‚˜์”ฉ ์žก์Œ
GreedyLazy๋™์ž‘
**?0๋ฒˆ ์ด์ƒ (์ตœ๋Œ€ํ•œ ๊ธธ๊ฒŒ vs ์ตœ๋Œ€ํ•œ ์งง๊ฒŒ)
++?1๋ฒˆ ์ด์ƒ (์ตœ๋Œ€ํ•œ ๊ธธ๊ฒŒ vs ์ตœ๋Œ€ํ•œ ์งง๊ฒŒ)
???0~1๋ฒˆ (์ตœ๋Œ€ํ•œ ๊ธธ๊ฒŒ vs ์ตœ๋Œ€ํ•œ ์งง๊ฒŒ)
{m,n}{m,n}?m~n๋ฒˆ (์ตœ๋Œ€ํ•œ ๊ธธ๊ฒŒ vs ์ตœ๋Œ€ํ•œ ์งง๊ฒŒ)
# ๋˜ ๋‹ค๋ฅธ ์˜ˆ์‹œ
text = 'aaa'
 
re.findall(r'a+', text)    # ['aaa']  โ€” Greedy: a๋ฅผ ์ตœ๋Œ€ํ•œ ๊ธธ๊ฒŒ
re.findall(r'a+?', text)   # ['a', 'a', 'a']  โ€” Lazy: a๋ฅผ ์ตœ๋Œ€ํ•œ ์งง๊ฒŒ (1๊ฐœ์”ฉ)

4. ํŒจํ„ด ํ™•์žฅ โ€” ์œ„์น˜ (์•ต์ปค)

์•ต์ปค๋Š” ๋ฌธ์ž๋ฅผ ์†Œ๋น„ํ•˜์ง€ ์•Š๊ณ  ์œ„์น˜๋งŒ ์ง€์ •ํ•œ๋‹ค. โ€œ์—ฌ๊ธฐ์— ์žˆ์–ด์•ผ ํ•œ๋‹คโ€๋Š” ์กฐ๊ฑด์„ ๊ฑฐ๋Š” ๊ฒƒ์ด๋‹ค.

^ โ€” ์‹œ์ž‘, $ โ€” ๋
re.search(r'^hello', 'hello world')    # Match โ€” ์‹œ์ž‘์ด hello
re.search(r'^hello', 'say hello')      # None โ€” ์‹œ์ž‘์ด ์•„๋‹˜
 
re.search(r'world$', 'hello world')    # Match โ€” ๋์ด world
re.search(r'world$', 'world hello')    # None โ€” ๋์ด ์•„๋‹˜
# ^์™€ $๋ฅผ ๊ฐ™์ด ์“ฐ๋ฉด "์ „์ฒด๊ฐ€ ์ด ํŒจํ„ด์ด์–ด์•ผ ํ•œ๋‹ค"
re.search(r'^\d+$', '12345')     # Match โ€” ์ „์ฒด๊ฐ€ ์ˆซ์ž
re.search(r'^\d+$', '123abc')    # None โ€” ์ˆซ์ž๊ฐ€ ์•„๋‹Œ ๋ถ€๋ถ„ ์žˆ์Œ
\b โ€” ๋‹จ์–ด ๊ฒฝ๊ณ„

๋‹จ์–ด ๋ฌธ์ž(\w)์™€ ๋น„๋‹จ์–ด ๋ฌธ์ž(\W) ์‚ฌ์ด์˜ ๊ฒฝ๊ณ„๋ฅผ ๋งค์นญํ•œ๋‹ค. ๊ธ€์ž๋ฅผ ์†Œ๋น„ํ•˜์ง€ ์•Š๋Š”๋‹ค.

re.findall(r'\bcat\b', 'cat catalog catfish the cat sat')
# ['cat', 'cat']
# 'catalog', 'catfish'์˜ cat์€ ๋‹จ์–ด ๊ฒฝ๊ณ„๊ฐ€ ์•„๋‹ˆ๋ผ ๋งค์นญ ์•ˆ ๋จ
 
re.findall(r'cat', 'cat catalog catfish the cat sat')
# ['cat', 'cat', 'cat', 'cat']
# \b ์—†์ด ์“ฐ๋ฉด ๋ถ€๋ถ„ ๋งค์นญ๋„ ์ „๋ถ€ ์žกํž˜
# ์‹ค์ „: ํŠน์ • ๋‹จ์–ด๋งŒ ์ •ํ™•ํžˆ ์น˜ํ™˜
re.sub(r'\bJava\b', 'Python', 'Java and JavaScript are different')
# 'Python and JavaScript are different'
# JavaScript์˜ Java๋Š” ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š์Œ
\A์™€ \Z โ€” ์ ˆ๋Œ€ ์‹œ์ž‘/๋

^์™€ $๋Š” ๋’ค์—์„œ ๋ฐฐ์šธ MULTILINE ํ”Œ๋ž˜๊ทธ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€๋งŒ, \A์™€ \Z๋Š” ํ•ญ์ƒ ๋ฌธ์ž์—ด์˜ ์ ˆ๋Œ€ ์‹œ์ž‘/๋๋งŒ ์˜๋ฏธํ•œ๋‹ค.


5. ํŒจํ„ด ํ™•์žฅ โ€” ๊ทธ๋ฃน

์†Œ๊ด„ํ˜ธ ()๋กœ ํŒจํ„ด์˜ ์ผ๋ถ€๋ฅผ ๋ฌถ์œผ๋ฉด ๊ทธ๋ฃน์ด ๋œ๋‹ค. ๊ทธ๋ฃน์€ ๋‘ ๊ฐ€์ง€ ์—ญํ• ์„ ํ•œ๋‹ค: ๋ฌถ์–ด์„œ ์ˆ˜๋Ÿ‰์ž ์ ์šฉ, ๋งค์นญ๋œ ๋ถ€๋ถ„ ์บก์ฒ˜.

๊ธฐ๋ณธ ๊ทธ๋ฃน
# ๊ทธ๋ฃน ์—†์ด: ab+ = a ๋‹ค์Œ์— b๊ฐ€ 1๋ฒˆ ์ด์ƒ
re.findall(r'ab+', 'ab abb abab')
# ['ab', 'abb', 'ab', 'ab']
 
# ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ๊ธฐ: (ab)+ = 'ab'๊ฐ€ 1๋ฒˆ ์ด์ƒ
re.findall(r'(ab)+', 'ab abb abab')
# ['ab', 'ab', 'ab']

findall๊ณผ ๊ทธ๋ฃน์˜ ๊ด€๊ณ„

findall์€ ๊ทธ๋ฃน์ด ์žˆ์œผ๋ฉด ๊ทธ๋ฃน ๋‚ด์šฉ๋งŒ ๋ฐ˜ํ™˜ํ•œ๋‹ค. ์ „์ฒด ๋งค์นญ์„ ๋ณด๋ ค๋ฉด ๋น„์บก์ฒ˜ ๊ทธ๋ฃน (?:...)์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๊ทธ๋ฃน์„ ์ œ๊ฑฐํ•œ๋‹ค.

re.findall(r'(\d+)-(\d+)', 'a1-2 b3-4')
# [('1', '2'), ('3', '4')]  โ€” ๊ทธ๋ฃน ํŠœํ”Œ
 
re.findall(r'\d+-\d+', 'a1-2 b3-4')
# ['1-2', '3-4']  โ€” ์ „์ฒด ๋งค์นญ
์บก์ฒ˜ ๊ทธ๋ฃน์œผ๋กœ ๋ถ€๋ถ„ ์ถ”์ถœ
m = re.search(r'(\d+)-(\d+)-(\d+)', '์ „ํ™”๋ฒˆํ˜ธ: 010-1234-5678')
 
m.group()    # '010-1234-5678'  ์ „์ฒด ๋งค์นญ
m.group(0)   # '010-1234-5678'  group()๊ณผ ๋™์ผ
m.group(1)   # '010'            ์ฒซ ๋ฒˆ์งธ ๊ทธ๋ฃน
m.group(2)   # '1234'           ๋‘ ๋ฒˆ์งธ ๊ทธ๋ฃน
m.group(3)   # '5678'           ์„ธ ๋ฒˆ์งธ ๊ทธ๋ฃน
m.groups()   # ('010', '1234', '5678')
๋ช…๋ช…๋œ ๊ทธ๋ฃน (?P<name>...)

๋ฒˆํ˜ธ ๋Œ€์‹  ์ด๋ฆ„์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค. ํŒจํ„ด์ด ๋ณต์žกํ•ด์ง€๋ฉด ๊ฐ€๋…์„ฑ์ด ํ›จ์”ฌ ์ข‹๋‹ค.

m = re.search(
    r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})',
    '์˜ค๋Š˜์€ 2026-04-12'
)
 
m.group('year')    # '2026'
m.group('month')   # '04'
m.groupdict()      # {'year': '2026', 'month': '04', 'day': '12'}
๋น„์บก์ฒ˜ ๊ทธ๋ฃน (?:...)

๋ฌถ๊ธฐ๋งŒ ํ•˜๊ณ  ์บก์ฒ˜๋Š” ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๊ทธ๋ฃน ๋ฒˆํ˜ธ๋ฅผ ์†Œ๋น„ํ•˜์ง€ ์•Š๋Š”๋‹ค.

# http ๋˜๋Š” https๋ฅผ ๋ฌถ๋˜ ์บก์ฒ˜ํ•˜์ง€ ์•Š์Œ
re.findall(r'(?:https?://)\S+', 'visit http://a.com or https://b.com')
# ['http://a.com', 'https://b.com']
OR ์—ฐ์‚ฐ์ž |

|๋Š” โ€œ๋˜๋Š”โ€์ด๋‹ค. ๊ทธ๋ฃน๊ณผ ํ•จ๊ป˜ ์“ฐ๋ฉด ํŠน์ • ๋ถ€๋ถ„์—๋งŒ OR์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

re.findall(r'cat|dog', 'I have a cat and a dog')
# ['cat', 'dog']
 
# ๊ทธ๋ฃน์œผ๋กœ OR ๋ฒ”์œ„ ์ œํ•œ
re.findall(r'(?:cat|dog)s?', 'cats and dogs and cat')
# ['cats', 'dogs', 'cat']
์—ญ์ฐธ์กฐ (Backreference)

์บก์ฒ˜ํ•œ ๊ทธ๋ฃน์„ ํŒจํ„ด ์•ˆ์—์„œ ๋‹ค์‹œ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋‹ค.

# ์—ฐ์† ์ค‘๋ณต ๋‹จ์–ด ์ฐพ๊ธฐ
re.search(r'\b(\w+)\s+\1\b', 'the the cat').group()
# 'the the'  โ€” \1์ด ์ฒซ ๋ฒˆ์งธ ๊ทธ๋ฃน(the)๊ณผ ๊ฐ™์€ ํ…์ŠคํŠธ๋ฅผ ๋งค์นญ
 
# ๋ช…๋ช…๋œ ์—ญ์ฐธ์กฐ
re.search(r'\b(?P<word>\w+)\s+(?P=word)\b', 'the the cat').group()
# 'the the'
# sub์—์„œ ์—ญ์ฐธ์กฐ๋กœ ์ˆœ์„œ ๋ฐ”๊พธ๊ธฐ
re.sub(r'(\w+) (\w+)', r'\2 \1', 'hello world')
# 'world hello'
 
# ๋ช…๋ช…๋œ ๊ทธ๋ฃน์œผ๋กœ ๋‚ ์งœ ํ˜•์‹ ๋ณ€ํ™˜: YYYY-MM-DD โ†’ DD/MM/YYYY
re.sub(
    r'(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})',
    r'\g<d>/\g<m>/\g<y>',
    '2026-04-12'
)
# '12/04/2026'

6. re ๋ชจ๋“ˆ ํ•ต์‹ฌ ํ•จ์ˆ˜

ํŒจํ„ด ๋ฌธ๋ฒ•์„ ๋ฐฐ์› ์œผ๋‹ˆ, ์ด์ œ ์ด ํŒจํ„ด์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•˜๋Š”์ง€ ํ•จ์ˆ˜๋ฅผ ์ •๋ฆฌํ•œ๋‹ค.

ํ•จ์ˆ˜ ์š”์•ฝ
ํ•จ์ˆ˜์„ค๋ช…๋ฐ˜ํ™˜๊ฐ’
re.search(pattern, string)๋ฌธ์ž์—ด ์ „์ฒด๋ฅผ ์Šค์บ”, ์ฒซ ๋ฒˆ์งธ ๋งค์นญMatch ๋˜๋Š” None
re.match(pattern, string)๋ฌธ์ž์—ด ์‹œ์ž‘์—์„œ๋งŒ ๋งค์นญMatch ๋˜๋Š” None
re.fullmatch(pattern, string)๋ฌธ์ž์—ด ์ „์ฒด๊ฐ€ ํŒจํ„ด๊ณผ ์ผ์น˜Match ๋˜๋Š” None
re.findall(pattern, string)๊ฒน์น˜์ง€ ์•Š๋Š” ๋ชจ๋“  ๋งค์นญ์„ ๋ฆฌ์ŠคํŠธ๋กœlist[str]
re.finditer(pattern, string)๋ชจ๋“  ๋งค์นญ์„ ์ดํ„ฐ๋ ˆ์ดํ„ฐ๋กœiterator[Match]
re.sub(pattern, repl, string)๋งค์นญ๋œ ๋ถ€๋ถ„์„ ์น˜ํ™˜str
re.split(pattern, string)ํŒจํ„ด ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฆฌlist[str]
re.compile(pattern)ํŒจํ„ด์„ ๋ฏธ๋ฆฌ ์ปดํŒŒ์ผPattern ๊ฐ์ฒด
search vs match vs fullmatch
text = 'abc123def'
 
re.search(r'\d+', text)      # Match '123' โ€” ์–ด๋””์„œ๋“  ์ฒซ ๋งค์นญ
re.match(r'\d+', text)       # None        โ€” ์‹œ์ž‘์ด ์ˆซ์ž๊ฐ€ ์•„๋‹˜
re.match(r'[a-z]+', text)    # Match 'abc' โ€” ์‹œ์ž‘์ด ์†Œ๋ฌธ์ž
 
re.fullmatch(r'\d+', '123')  # Match โ€” ์ „์ฒด๊ฐ€ ์ˆซ์ž
re.fullmatch(r'\d+', '123a') # None  โ€” ์ „์ฒด๊ฐ€ ์ผ์น˜ํ•˜์ง€ ์•Š์Œ

match๋Š” ^๊ฐ€ ์•”๋ฌต์ ์œผ๋กœ ๋ถ™์–ด์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

findall vs finditer
re.findall(r'\d+', 'a1 b22 c333')
# ['1', '22', '333']
 
for m in re.finditer(r'\d+', 'a1 b22 c333'):
    print(f"์œ„์น˜ {m.span()}: {m.group()}")
# ์œ„์น˜ (1, 2): 1
# ์œ„์น˜ (4, 6): 22
# ์œ„์น˜ (8, 11): 333

finditer๋Š” Match ๊ฐ์ฒด๋ฅผ ํ•˜๋‚˜์”ฉ ๋ฐ˜ํ™˜ํ•˜๋ฏ€๋กœ ์œ„์น˜ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•˜๊ฑฐ๋‚˜ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ด ์ข‹๋‹ค.

sub โ€” ์น˜ํ™˜
re.sub(r'\d+', 'NUM', 'a1 b22')
# 'aNUM bNUM'
 
# ํ•จ์ˆ˜๋ฅผ ๋„˜๊ธฐ๋ฉด ๋งค์นญ๋งˆ๋‹ค ๋™์  ์น˜ํ™˜ ๊ฐ€๋Šฅ
re.sub(r'\d+', lambda m: str(int(m.group()) * 2), 'a1 b2')
# 'a2 b4'
split โ€” ๋ถ„๋ฆฌ
re.split(r'[,;]\s*', 'one, two;three,  four')
# ['one', 'two', 'three', 'four']
compile โ€” ํŒจํ„ด ์žฌ์‚ฌ์šฉ
pattern = re.compile(r'\b\w{3}\b')
 
pattern.findall('the cat sat on a mat')
# ['the', 'cat', 'sat', 'mat']
 
# ๋ฐ˜๋ณต ์‚ฌ์šฉ ์‹œ ์ปดํŒŒ์ผํ•ด๋‘๋ฉด ์„ฑ๋Šฅ์— ์œ ๋ฆฌํ•˜๋‹ค
for line in huge_log_file:
    if pattern.search(line):
        process(line)
Match ๊ฐ์ฒด ์ฃผ์š” ๋ฉ”์„œ๋“œ
m = re.search(r'(\d+)-(\d+)', 'code: 123-456')
 
m.group()       # '123-456'      ์ „์ฒด ๋งค์นญ
m.group(1)      # '123'          ์ฒซ ๋ฒˆ์งธ ๊ทธ๋ฃน
m.group(2)      # '456'          ๋‘ ๋ฒˆ์งธ ๊ทธ๋ฃน
m.groups()      # ('123', '456') ๋ชจ๋“  ๊ทธ๋ฃน
m.start()       # 6              ๋งค์นญ ์‹œ์ž‘ ์œ„์น˜
m.end()         # 13             ๋งค์นญ ๋ ์œ„์น˜
m.span()        # (6, 13)        (์‹œ์ž‘, ๋)

7. ํ”Œ๋ž˜๊ทธ

ํ”Œ๋ž˜๊ทธ๋Š” ํŒจํ„ด์˜ ๋™์ž‘ ๋ฐฉ์‹์„ ๋ฐ”๊พผ๋‹ค.

ํ”Œ๋ž˜๊ทธ์•ฝ์–ด์ธ๋ผ์ธ์„ค๋ช…
re.IGNORECASEre.I(?i)๋Œ€์†Œ๋ฌธ์ž ๋ฌด์‹œ
re.MULTILINEre.M(?m)^, $๊ฐ€ ๊ฐ ์ค„์˜ ์‹œ์ž‘/๋์—๋„ ๋งค์นญ
re.DOTALLre.S(?s).์ด ์ค„๋ฐ”๊ฟˆ(\n)๋„ ๋งค์นญ
re.VERBOSEre.X(?x)๊ณต๋ฐฑ๊ณผ ์ฃผ์„ ํ—ˆ์šฉ (๊ฐ€๋…์„ฑ ํ–ฅ์ƒ)
re.ASCIIre.A(?a)\w, \d ๋“ฑ์„ ASCII ์ „์šฉ์œผ๋กœ ์ œํ•œ
IGNORECASE
re.findall(r'hello', 'Hello HELLO hello', re.I)
# ['Hello', 'HELLO', 'hello']
MULTILINE
text = "hello world\nhello python"
 
re.findall(r'^hello', text)           # ['hello'] โ€” ์ฒซ ์ค„๋งŒ
re.findall(r'^hello', text, re.M)     # ['hello', 'hello'] โ€” ๊ฐ ์ค„
DOTALL
text = '<div>\nhello\n</div>'
 
re.search(r'<div>.*</div>', text)             # None โ€” .์ด \n ๋งค์นญ ์•ˆ ํ•จ
re.search(r'<div>.*</div>', text, re.S)       # Match โ€” .์ด \n๋„ ๋งค์นญ
VERBOSE โ€” ํŒจํ„ด์— ์ฃผ์„ ๋‹ฌ๊ธฐ

๋ณต์žกํ•œ ํŒจํ„ด์„ ์ฝ๊ธฐ ์‰ฝ๊ฒŒ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

email_pattern = re.compile(r"""
    [a-zA-Z0-9._%+-]+    # ์‚ฌ์šฉ์ž๋ช…
    @                     # @ ๊ตฌ๋ถ„์ž
    [a-zA-Z0-9.-]+       # ๋„๋ฉ”์ธ
    \.[a-zA-Z]{2,}       # TLD (.com, .co.kr ๋“ฑ)
""", re.VERBOSE)
ํ”Œ๋ž˜๊ทธ ์กฐํ•ฉ๊ณผ ์ธ๋ผ์ธ
# ์—ฌ๋Ÿฌ ํ”Œ๋ž˜๊ทธ๋ฅผ | ๋กœ ์กฐํ•ฉ
pattern = re.compile(r'hello', re.IGNORECASE | re.MULTILINE)
 
# ํŒจํ„ด ์•ˆ์—์„œ ์ธ๋ผ์ธ์œผ๋กœ ์ง€์ •
re.findall(r'(?i)hello', 'Hello HELLO hello')
# ['Hello', 'HELLO', 'hello']
 
# ํŠน์ • ๊ทธ๋ฃน์—๋งŒ ์ ์šฉ
re.findall(r'(?i:hello) WORLD', 'Hello WORLD hello WORLD')
# ['Hello WORLD', 'hello WORLD']

8. ์ „ํ›„๋ฐฉ ํƒ์ƒ‰ (Lookahead / Lookbehind)

์•ต์ปค์ฒ˜๋Ÿผ ๋ฌธ์ž๋ฅผ ์†Œ๋น„ํ•˜์ง€ ์•Š๊ณ  ์กฐ๊ฑด๋งŒ ํ™•์ธํ•˜๋Š” ํŒจํ„ด์ด๋‹ค. โ€œ์•ž/๋’ค์— ์ด๊ฒŒ ์žˆ๋Š” ๊ฒฝ์šฐ์—๋งŒ ๋งค์นญํ•ด๋ผโ€๋Š” ์˜๋ฏธ๋‹ค.

ํŒจํ„ด์ด๋ฆ„์˜๋ฏธ
(?=Y)๊ธ์ • ์ „๋ฐฉ ํƒ์ƒ‰๋’ค์— Y๊ฐ€ ์žˆ์„ ๋•Œ๋งŒ ๋งค์นญ
(?!Y)๋ถ€์ • ์ „๋ฐฉ ํƒ์ƒ‰๋’ค์— Y๊ฐ€ ์—†์„ ๋•Œ๋งŒ ๋งค์นญ
(?<=Y)๊ธ์ • ํ›„๋ฐฉ ํƒ์ƒ‰์•ž์— Y๊ฐ€ ์žˆ์„ ๋•Œ๋งŒ ๋งค์นญ
(?<!Y)๋ถ€์ • ํ›„๋ฐฉ ํƒ์ƒ‰์•ž์— Y๊ฐ€ ์—†์„ ๋•Œ๋งŒ ๋งค์นญ
์ „๋ฐฉ ํƒ์ƒ‰ (Lookahead)
# ๋’ค์— 'bar'๊ฐ€ ์˜ค๋Š” 'foo'๋งŒ ๋งค์นญ
re.findall(r'foo(?=bar)', 'foobar foobaz foo')
# ['foo']  โ€” foobar์˜ foo๋งŒ
 
# ๋’ค์— 'bar'๊ฐ€ ์˜ค์ง€ ์•Š๋Š” 'foo'๋งŒ ๋งค์นญ
re.findall(r'foo(?!bar)', 'foobar foobaz foo')
# ['foo', 'foo']  โ€” foobaz์˜ foo, ๋‹จ๋… foo
ํ›„๋ฐฉ ํƒ์ƒ‰ (Lookbehind)
# ์•ž์— '$'๊ฐ€ ์žˆ๋Š” ์ˆซ์ž๋งŒ ๋งค์นญ
re.findall(r'(?<=\$)\d+', 'price $100, count 50')
# ['100']
 
# ์•ž์— '$'๊ฐ€ ์—†๋Š” ์ˆซ์ž๋งŒ ๋งค์นญ
re.findall(r'(?<!\$)\d+', 'price $100, count 50')
# ['00', '50']
# 100์—์„œ $๋’ค์˜ 1์ด ๊ฑธ๋Ÿฌ์ง€๊ณ  ๋‚˜๋จธ์ง€ 00์ด ๋งค์นญ๋จ

Lookbehind ์ œ์•ฝ

re ๋ชจ๋“ˆ์˜ lookbehind๋Š” ๊ณ ์ • ๊ธธ์ด๋งŒ ํ—ˆ์šฉํ•œ๋‹ค. (?<=\w+)์ฒ˜๋Ÿผ ๊ฐ€๋ณ€ ๊ธธ์ด๋Š” ์—๋Ÿฌ๊ฐ€ ๋‚œ๋‹ค. ๊ฐ€๋ณ€ ๊ธธ์ด๊ฐ€ ํ•„์š”ํ•˜๋ฉด ์„œ๋“œํŒŒํ‹ฐ regex ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

์‹ค์ „: ๋น„๋ฐ€๋ฒˆํ˜ธ ๊ฐ•๋„ ์ฒดํฌ

์ „๋ฐฉ ํƒ์ƒ‰์„ ์—ฌ๋Ÿฌ ๊ฐœ ์กฐํ•ฉํ•˜๋ฉด โ€œ์—ฌ๋Ÿฌ ์กฐ๊ฑด์„ ๋™์‹œ์— ๋งŒ์กฑโ€ํ•˜๋Š” ํŒจํ„ด์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

# 8์ž ์ด์ƒ, ๋Œ€๋ฌธ์ž ํฌํ•จ, ์†Œ๋ฌธ์ž ํฌํ•จ, ์ˆซ์ž ํฌํ•จ
pattern = r'^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$'
 
bool(re.match(pattern, 'Hello123!'))   # True
bool(re.match(pattern, 'hello123'))    # False (๋Œ€๋ฌธ์ž ์—†์Œ)
bool(re.match(pattern, 'HELLO123'))    # False (์†Œ๋ฌธ์ž ์—†์Œ)
bool(re.match(pattern, 'Helloabc'))    # False (์ˆซ์ž ์—†์Œ)
bool(re.match(pattern, 'Hi1!'))        # False (8์ž ๋ฏธ๋งŒ)

๊ฐ (?=.*X)๋Š” โ€œ์–ด๋”˜๊ฐ€์— X๊ฐ€ ์žˆ์–ด์•ผ ํ•œ๋‹คโ€๋Š” ์กฐ๊ฑด์„ ์œ„์น˜๋ฅผ ์†Œ๋น„ํ•˜์ง€ ์•Š๊ณ  ๊ฑด๋‹ค. ๊ทธ๋ž˜์„œ ์กฐ๊ฑด ์—ฌ๋Ÿฌ ๊ฐœ๋ฅผ ๋‚˜๋ž€ํžˆ ์“ธ ์ˆ˜ ์žˆ๋‹ค.


9. ์‹ค์ „ ํŒจํ„ด ๋ชจ์Œ

์ž์ฃผ ์‚ฌ์šฉ๋˜๋Š” ํŒจํ„ด๋“ค์„ ์ •๋ฆฌํ–ˆ๋‹ค. ๋ณต์‚ฌํ•ด์„œ ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

# ์ด๋ฉ”์ผ
email = r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}'
re.findall(email, 'contact user@example.com or admin@test.co.kr')
# ['user@example.com', 'admin@test.co.kr']
 
# URL
url = r'https?://\S+'
re.findall(url, 'visit https://example.com/path?q=1 for more')
# ['https://example.com/path?q=1']
 
# IPv4 ์ฃผ์†Œ
ipv4 = r'\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b'
re.findall(ipv4, 'server: 192.168.1.1, invalid: 999.999.999.999')
# ['192.168.1.1']
 
# ํ•œ๊ตญ ์ „ํ™”๋ฒˆํ˜ธ
phone_kr = r'0\d{1,2}-\d{3,4}-\d{4}'
re.findall(phone_kr, '์—ฐ๋ฝ์ฒ˜: 010-1234-5678, 02-123-4567')
# ['010-1234-5678', '02-123-4567']
 
# ๋‚ ์งœ (YYYY-MM-DD)
date_iso = r'\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])'
re.findall(date_iso, '๊ธฐ๊ฐ„: 2026-01-15 ~ 2026-04-12')
# ['2026-01-15', '2026-04-12']
 
# ํ•œ๊ธ€๋งŒ ์ถ”์ถœ
korean = r'[๊ฐ€-ํžฃ]+'
re.findall(korean, '์•ˆ๋…•ํ•˜์„ธ์š” hello ์„ธ๊ณ„ world')
# ['์•ˆ๋…•ํ•˜์„ธ์š”', '์„ธ๊ณ„']
 
# HTML ํƒœ๊ทธ ์ œ๊ฑฐ
html_strip = r'<[^>]+>'
re.sub(html_strip, '', '<p>Hello <b>world</b></p>')
# 'Hello world'

์‹ค์ „์—์„œ์˜ ํ•œ๊ณ„

์ด๋ฉ”์ผ, URL ๋“ฑ์˜ ์™„๋ฒฝํ•œ ๊ฒ€์ฆ์€ regex๋งŒ์œผ๋กœ๋Š” ์–ด๋ ต๋‹ค. ์‹ค๋ฌด์—์„œ๋Š” ์ „์šฉ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ(email-validator, urllib.parse, ipaddress)์™€ ๋ณ‘ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.


10. ์„ฑ๋Šฅ ์ตœ์ ํ™”

compile ํ™œ์šฉ

๋ฐ˜๋ณต ์‚ฌ์šฉ๋˜๋Š” ํŒจํ„ด์€ ๋ฏธ๋ฆฌ ์ปดํŒŒ์ผํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ข‹๋‹ค. re ๋ชจ๋“ˆ์ด ๋‚ด๋ถ€์ ์œผ๋กœ ์ตœ๋Œ€ 512๊ฐœ๊นŒ์ง€ ์บ์‹œํ•˜์ง€๋งŒ ๋ช…์‹œ์  ์ปดํŒŒ์ผ์ด ๋” ํ™•์‹คํ•˜๋‹ค.

pattern = re.compile(r'\d{3}-\d{4}')
 
for line in huge_file:
    if pattern.search(line):
        process(line)
์žฌ์•™์  ๋ฐฑํŠธ๋ž˜ํ‚น (Catastrophic Backtracking)

์ค‘์ฒฉ๋œ ์ˆ˜๋Ÿ‰์ž๊ฐ€ ์žˆ๋Š” ํŒจํ„ด์—์„œ ๋งค์นญ ์‹คํŒจ ์‹œ ์ง€์ˆ˜์ ์œผ๋กœ ์กฐํ•ฉ์„ ์‹œ๋„ํ•˜๋Š” ํ˜„์ƒ์ด๋‹ค.

# โŒ ์œ„ํ—˜ํ•œ ํŒจํ„ด โ€” ์ž…๋ ฅ์ด ๊ธธ์–ด์ง€๋ฉด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋А๋ ค์ง„๋‹ค
bad = r'(a+)+b'
# 'aaaaaaaaaaac'์— ๋Œ€ํ•ด ๋งค์นญ ์‹คํŒจ๋ฅผ ํ™•์ธํ•˜๋Š” ๋ฐ ์ˆ˜ ์ดˆ~์ˆ˜ ๋ถ„ ์†Œ์š”

ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•:

# ํŒจํ„ด ๋‹จ์ˆœํ™”
good = r'a+b'
 
# Possessive ์ˆ˜๋Ÿ‰์ž (Python 3.11+) โ€” ๋ฐฑํŠธ๋ž˜ํ‚น ์ฐจ๋‹จ
good = r'a++b'
 
# ์›์ž์  ๊ทธ๋ฃน (Python 3.11+)
good = r'(?>a+)b'

์„ฑ๋Šฅ ํŒ ์ •๋ฆฌ

  • ๊ฐ€๋Šฅํ•˜๋ฉด str.startswith(), in ๊ฐ™์€ ๋ฌธ์ž์—ด ๋ฉ”์„œ๋“œ๋ฅผ ๋จผ์ € ๊ณ ๋ คํ•œ๋‹ค
  • finditer()๋Š” findall()๋ณด๋‹ค ๋ฉ”๋ชจ๋ฆฌ ํšจ์œจ์ ์ด๋‹ค
  • ๋น„์บก์ฒ˜ ๊ทธ๋ฃน (?:...)์œผ๋กœ ๋ถˆํ•„์š”ํ•œ ์บก์ฒ˜๋ฅผ ์ค„์ธ๋‹ค
  • ๊ตฌ์ฒด์ ์ธ ๋ฌธ์ž ํด๋ž˜์Šค([^,\n]+)๋ฅผ .๋ณด๋‹ค ์„ ํ˜ธํ•œ๋‹ค

11. ๋ฒ„์ „๋ณ„ ๋ณ€๊ฒฝ์‚ฌํ•ญ

๋ฒ„์ „๋ณ€๊ฒฝ์‚ฌํ•ญ
3.6re.Match, re.Pattern์„ ํƒ€์ž… ํžŒํŠธ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅ
3.7re.LOCALE์ด ๋ฐ”์ดํŠธ ํŒจํ„ด ์ „์šฉ์œผ๋กœ ์ œํ•œ
3.8\N{name} ์œ ๋‹ˆ์ฝ”๋“œ ์ด๋ฆ„ ์ด์Šค์ผ€์ดํ”„ ์ง€์›
3.11Possessive ์ˆ˜๋Ÿ‰์ž (*+, ++, ?+, {m,n}+) ์ถ”๊ฐ€
3.11์›์ž์  ๊ทธ๋ฃน ((?>...)) ์ถ”๊ฐ€
3.12์ž˜๋ชป๋œ ์ด์Šค์ผ€์ดํ”„ ์‹œํ€€์Šค์— ๋Œ€ํ•œ DeprecationWarning ๊ฐ•ํ™”
3.14์ž˜๋ชป๋œ ์ด์Šค์ผ€์ดํ”„ ์‹œํ€€์Šค๊ฐ€ SyntaxWarning, ํ–ฅํ›„ SyntaxError ์˜ˆ์ •

3.11์—์„œ ์ถ”๊ฐ€๋œ Possessive ์ˆ˜๋Ÿ‰์ž์™€ ์›์ž์  ๊ทธ๋ฃน์ด ๊ฐ€์žฅ ํฐ ๋ณ€ํ™”๋‹ค.


12. re vs regex (์„œ๋“œํŒŒํ‹ฐ) ๋น„๊ต

ํ‘œ์ค€ re ๋ชจ๋“ˆ๋กœ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ ์„œ๋“œํŒŒํ‹ฐ regex ๋ชจ๋“ˆ(pip install regex)์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ธฐ๋Šฅre (ํ‘œ์ค€)regex (์„œ๋“œํŒŒํ‹ฐ)
์„ค์น˜๊ธฐ๋ณธ ๋‚ด์žฅpip install regex
Possessive ์ˆ˜๋Ÿ‰์ž3.11+๋ชจ๋“  ๋ฒ„์ „
์›์ž์  ๊ทธ๋ฃน3.11+๋ชจ๋“  ๋ฒ„์ „
๊ฐ€๋ณ€ ๊ธธ์ด Lookbehind๋ถˆ๊ฐ€ (๊ณ ์ • ๊ธธ์ด๋งŒ)์ง€์›
์œ ๋‹ˆ์ฝ”๋“œ ์†์„ฑ \p{L}๋ถˆ๊ฐ€์ง€์›
ํผ์ง€ ๋งค์นญ (Fuzzy)๋ถˆ๊ฐ€์ง€์›
๊ฒน์นจ ๋งค์นญ (Overlapped)๋ถˆ๊ฐ€overlapped=True
์žฌ๊ท€ ํŒจํ„ด๋ถˆ๊ฐ€(?0), (?&name)
import regex
 
# ๊ฐ€๋ณ€ ๊ธธ์ด lookbehind (re์—์„œ๋Š” ๋ถˆ๊ฐ€)
regex.findall(r'(?<=\b\w+)\d+', 'pay5 dot3')
# ['5', '3']
 
# ์œ ๋‹ˆ์ฝ”๋“œ ์†์„ฑ์œผ๋กœ ํ•œ๊ธ€ ์ถ”์ถœ
regex.findall(r'\p{Hangul}+', '์•ˆ๋…•ํ•˜์„ธ์š” hello ์„ธ๊ณ„')
# ['์•ˆ๋…•ํ•˜์„ธ์š”', '์„ธ๊ณ„']
 
# ๊ฒน์นจ ๋งค์นญ
regex.findall(r'\w{2}', 'apple', overlapped=True)
# ['ap', 'pp', 'pl', 'le']
 
# ํผ์ง€ ๋งค์นญ (ํŽธ์ง‘ ๊ฑฐ๋ฆฌ 1 ์ด๋‚ด ํ—ˆ์šฉ)
regex.search(r'(?:hello){e<=1}', 'helo')   # Match

์–ธ์ œ regex ๋ชจ๋“ˆ์„ ์“ธ๊นŒ

  • ๊ฐ€๋ณ€ ๊ธธ์ด lookbehind๊ฐ€ ํ•„์š”ํ•  ๋•Œ
  • \p{Hangul}, \p{Greek} ๊ฐ™์€ ์œ ๋‹ˆ์ฝ”๋“œ ์†์„ฑ์ด ํ•„์š”ํ•  ๋•Œ
  • ํผ์ง€ ๋งค์นญ(์˜คํƒ€ ํ—ˆ์šฉ ๊ฒ€์ƒ‰)์ด ํ•„์š”ํ•  ๋•Œ
  • ๊ทธ ์™ธ์—๋Š” ํ‘œ์ค€ re๋กœ ์ถฉ๋ถ„ํ•˜๋‹ค

13. ํ”ํ•œ ํ•จ์ •๊ณผ ์ฃผ์˜์‚ฌํ•ญ

raw string ๋ˆ„๋ฝ
re.search('\bword\b', 'a word here')     # None โ€” \b๊ฐ€ ๋ฐฑ์ŠคํŽ˜์ด์Šค๋กœ ํ•ด์„
re.search(r'\bword\b', 'a word here')    # Match

Python 3.12๋ถ€ํ„ฐ raw string์ด ์•„๋‹Œ ํŒจํ„ด์˜ ์ž˜๋ชป๋œ ์ด์Šค์ผ€์ดํ”„์— ๋Œ€ํ•œ ๊ฒฝ๊ณ ๊ฐ€ ๊ฐ•ํ™”๋˜์—ˆ๋‹ค.

match()๋Š” ์‹œ์ž‘๋งŒ ๋ณธ๋‹ค
re.match(r'\d+', 'abc123')     # None
re.search(r'\d+', 'abc123')    # Match '123'

โ€œ์–ด๋”˜๊ฐ€์— ์žˆ๋Š”์ง€โ€ ํ™•์ธํ•˜๋ ค๋ฉด ํ•ญ์ƒ search๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

findall()์— ๊ทธ๋ฃน์ด ์žˆ์œผ๋ฉด ๊ทธ๋ฃน๋งŒ ๋ฐ˜ํ™˜
re.findall(r'(\d+)-(\d+)', '1-2 3-4')
# [('1', '2'), ('3', '4')]  โ€” ์ „์ฒด ๋งค์นญ์ด ์•„๋‹Œ ๊ทธ๋ฃน ํŠœํ”Œ
 
re.findall(r'\d+-\d+', '1-2 3-4')
# ['1-2', '3-4']  โ€” ์ „์ฒด ๋งค์นญ
DOTALL ์—†์ด ์—ฌ๋Ÿฌ ์ค„ ๋งค์นญ
text = '<div>\nhello\n</div>'
 
re.search(r'<div>.*</div>', text)             # None
re.search(r'<div>.*</div>', text, re.DOTALL)  # Match
๋ฆฌํ„ฐ๋Ÿด ๋ฐฑ์Šฌ๋ž˜์‹œ ๋งค์นญ

ํ…์ŠคํŠธ์˜ \ ํ•˜๋‚˜๋ฅผ ๋งค์นญํ•˜๋ ค๋ฉด ํŒจํ„ด์—์„œ \\๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

re.search(r'\\', r'path\to')    # Match