Home > OS >  Regex. Get string beetwen two brakets (python)
Regex. Get string beetwen two brakets (python)

Time:05-14

I'm new in regular expressions and I have some problems with understanding it. There're some input strings

Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)

i want to get from each string:

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%

i use it regex: \((.*?)\)$, but in case

Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)

it returns

UK) - Carling Original (Lager - Pale. ABV 3,7%

I cannot imagine what I should add to my regex, for getting only

Lager - Pale. ABV 3,7%

CodePudding user response:

To only support up to a single nested level, you can use \(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$ regex, see the regex demo.

import re
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = re.compile(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(1) )

See the Python demo. Details:

  • \( - a ( char
  • ([^()]*(?:\([^()]*\)[^()]*)*) - Group 1: zero or more chars other than ( and ), then zero or more sequences of (, zero or more chars other than ( and ) and then a ) char, and then zero or more chars other than ( and )
  • \) - a ) char \s*$ - zero or more whitespaces and end of string.

To support any amount of nesting levels, you cannot use re since it does not support recursion. You can pip install regex and use

import regex
text = '''Coven - GLAM (Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15)
Molson Coors (UK) - Carling Original (Lager - Pale. ABV 3,7%)
Barista Chocolate Quad (Belgian Quadrupel. ABV 11%)
4Пивовара - Black Jesus White Pepper (Porter - American. OG 17, ABV 6.7%, IBU 69)
4Пивовара - Ether [Melon] (Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40)
Кер Сари Пшеничное (Wheat Beer - Other. ABV 4,5%)
Butch & Dutch - IPA 100 IBU (IPA - International. ABV 7%, IBU 100)
Trappistes Rochefort 6 (Belgian Dubbel. ABV 7,5%, IBU 22)
Fournier - Frères Producteurs - Eleveurs - Cidre Rose (Cider - Rose. ABV 3%)
Shepherd Neame - Classic Collection - India Pale Ale (IPA - English. OG 14,62%, ABV 6,1%)'''
rx = regex.compile(r'(\(((?:[^()]  |(?1))*)\))\s*$')
for line in text.splitlines(True):
    m = rx.search(line)
    if m:
        print( m.group(2) )

See the Python demo. Details:

  • (\(((?:[^()] |(?1))*)\)) - Group 1: (, then Group 2 capturing any zero or more sequences of one or more chars other than ( and ) or Group 1 pattern, then a ) char
  • \s*$ - zero or more whitespaces and end of string.

Output:

Lager - IPL (India Pale Lager). ABV 5.5%, IBU 15
Lager - Pale. ABV 3,7%
Belgian Quadrupel. ABV 11%
Porter - American. OG 17, ABV 6.7%, IBU 69
Sour - Farmhouse IPA OG 17, ABV 6.5%, IBU 40
Wheat Beer - Other. ABV 4,5%
IPA - International. ABV 7%, IBU 100
Belgian Dubbel. ABV 7,5%, IBU 22
Cider - Rose. ABV 3%
IPA - English. OG 14,62%, ABV 6,1%
  • Related