I'm new in regular expressions and I have some problems with understanding it. There're some input strings
Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)
i want to get from each string:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%
i use it regex: \((.*?)\)$
, but in case
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
it returns
UK) - Carling Original (Lager - Pale. ABV 3,7%
I cannot imagine what I should add to my regex, for getting only
Lager - Pale. ABV 3,7%
CodePudding user response:
To only support up to a single nested level, you can use \(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$
regex, see the regex demo.
import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(1) )
See the Python demo. Details:
\(
- a(
char([^()]*(?:\([^()]*\)[^()]*)*)
- Group 1: zero or more chars other than(
and)
, then zero or more sequences of(
, zero or more chars other than(
and)
and then a)
char, and then zero or more chars other than(
and)
\)
- a)
char\s*$
- zero or more whitespaces and end of string.
To support any amount of nesting levels, you cannot use re
since it does not support recursion. You can pip install regex
and use
import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()] |(?1))*)\))\s*$')
for line in text.splitlines(True):
m = rx.search(line)
if m:
print( m.group(2) )
See the Python demo. Details:
(\(((?:[^()] |(?1))*)\))
- Group 1:(
, then Group 2 capturing any zero or more sequences of one or more chars other than(
and)
or Group 1 pattern, then a)
char\s*$
- zero or more whitespaces and end of string.
Output:
Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%