Home > Back-end >  Can I use regex to specify a pattern that matches the correct number of opening and closing brackets
Can I use regex to specify a pattern that matches the correct number of opening and closing brackets

Time:10-23

Suppose I have the following string:

Prop1=a,Prop2=[Prop2_1=b,Prop2_2=c],Prop3=[Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d],...

I can extract the value of Prop1 very easily, with the pattern (?<=Prop1=).*?(?=,).

For Prop2 I want to extract [Prop2_1=b,Prop2_2=c], and for Prop3, I want to extract [Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d].

But here's the thing: I don't know in advance if what I'm looking for is nested, or how many degrees of nesting there are.

Is there a single regex expression that can handle the general case?

Edit I have been reminded that I need to specify which flavour of regex. I'm using python (import re).

CodePudding user response:

You cannot use re because it does not support recursion and regex subroutines. You need to install the PyPi regex module using pip install regex and then use import regex as re (or import regex and then use regex instead of re).

The pattern you can use is

Prop3=\K(?:(\[(?:[^][]  |(?1))*])|[^,]*)

See the regex demo. Details:

  • Prop3= - Prop3= text
  • \K - match reset operator that discards the text matched so far
  • (?:(\[(?:[^][] |(?1))*])|[^,]*) - a non-capturing group that matches
    • (\[(?:[^][] |(?1))*]) - Group 1: [, then zero or more repetitions of one or more chars other than [ and ] or the whole Group 1 pattern recursed, and then a ] char
    • | - or
    • [^,]* - zero or more chars other than ,
import regex
text = "Prop1=a,Prop2=[Prop2_1=b,Prop2_2=c],Prop3=[Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d],..."

print( regex.search(r'Prop1=\K(?:(\[(?:[^][]  |(?1))*])|[^,]*)', text).group() )
# => a
print( regex.search(r'Prop2=\K(?:(\[(?:[^][]  |(?1))*])|[^,]*)', text).group() )
# => [Prop2_1=b,Prop2_2=c]
print( regex.search(r'Prop3=\K(?:(\[(?:[^][]  |(?1))*])|[^,]*)', text).group() )
# => [Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d]

See the Python demo online.

  • Related