Suppose I have the following string:
Prop1=a,Prop2=[Prop2_1=b,Prop2_2=c],Prop3=[Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d],...
I can extract the value of Prop1
very easily, with the pattern (?<=Prop1=).*?(?=,)
.
For Prop2
I want to extract [Prop2_1=b,Prop2_2=c]
, and for Prop3, I want to extract [Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d]
.
But here's the thing: I don't know in advance if what I'm looking for is nested, or how many degrees of nesting there are.
Is there a single regex expression that can handle the general case?
Edit
I have been reminded that I need to specify which flavour of regex. I'm using python (import re
).
CodePudding user response:
You cannot use re
because it does not support recursion and regex subroutines. You need to install the PyPi regex
module using pip install regex
and then use import regex as re
(or import regex
and then use regex
instead of re
).
The pattern you can use is
Prop3=\K(?:(\[(?:[^][] |(?1))*])|[^,]*)
See the regex demo. Details:
Prop3=
-Prop3=
text\K
- match reset operator that discards the text matched so far(?:(\[(?:[^][] |(?1))*])|[^,]*)
- a non-capturing group that matches(\[(?:[^][] |(?1))*])
- Group 1:[
, then zero or more repetitions of one or more chars other than[
and]
or the whole Group 1 pattern recursed, and then a]
char|
- or[^,]*
- zero or more chars other than,
import regex
text = "Prop1=a,Prop2=[Prop2_1=b,Prop2_2=c],Prop3=[Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d],..."
print( regex.search(r'Prop1=\K(?:(\[(?:[^][] |(?1))*])|[^,]*)', text).group() )
# => a
print( regex.search(r'Prop2=\K(?:(\[(?:[^][] |(?1))*])|[^,]*)', text).group() )
# => [Prop2_1=b,Prop2_2=c]
print( regex.search(r'Prop3=\K(?:(\[(?:[^][] |(?1))*])|[^,]*)', text).group() )
# => [Prop3_1=[Prop3_2_1=e,Prop3_2_2=f,Prop3_2_3=g],Prop3_2=d]
See the Python demo online.