How to extract first, second etc. occurrence of string using regular expression-CodePudding

I have a text like this:

{"Gender"=>["Woman"], "Color"=>["White"], "Season"=>["Spring"], "Brand"=>["Gucci"]}

I want to be able to extract the value related to "Gender", "Color" etc respectively.

I found an expression that finds the value between the brackets: (?<=\[")(.*?)(?=\"])

But how to combine it with some regular expression that targets more specifically the Brand-value (or any of the other "Named" values)?

The ideal thing would of course be some entry like Color.expression that returns White another expression Brand.expression that returns Gucci etc.

But as the format of the string is quite stable it would also be ok to be able to address the first occurrence of what is inside the brackets ["xxx"], second occurrence of what is inside the brackets ["yyy"] (Group 1, Group 2, Group 3, Group4 ??).

Any ideas?

CodePudding user response：

Regarding to your last statement in the question, if the order is stable you could do re.findall and assign them with tuple unpacking like this:

string = '{"Gender"=>["Woman"], "Color"=>["White"], "Season"=>["Spring"], "Brand"=>["Gucci"]}'
pattern = r'(?<=\[")(.*?)(?=\"])'

Gender, Color, Season, Brand = re.findall(pattern,string)

print(f"{Gender= } \n{Color= } \n{Season= } \n{Brand= }")
# Be aware that f-strings with "=" specifier needs Python 3.8 or higher
# old Syntax:
print('Gender= ', Gender, '\nColor= ', Color, '\nSeason= ', Season, '\nBrand= ', Brand)

Gender= 'Woman' 
Color= 'White' 
Season= 'Spring' 
Brand= 'Gucci'

CodePudding user response：

You can use this to extract all values

(?<=[^[]\[\")([^\"] )

To extract a specific value, e.g value related to Season;

(?<=Season\"=>\[\")([^\"] )

(?<=Season\"=>\[\") - Positive Lookbehind to check for the literal Season"=>["
([^\"] ) - Capture group to extract values up to the next occurance of a double quote character.