I have one column in a dataframe with key value pairs I would like to extract.
AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619
I would like to parse key value pairs like so
('AF_ESP', '0.00546')
('AF_EXAC', '0.00165')
('AF_TGP', '0.00619')
Here is my regex.
([^=] )=([^;] )
This gets me most of way there:
('AF_ESP', '0.00546')
(';AF_EXAC', '0.00165')
(';AF_TGP', '0.00619')
How can I adjust it so semicolons are not captured in the result?
CodePudding user response:
You can consume the semi-colon or start of string in front:
(?:;|^)([^=] )=([^;] )
See the regex demo. Details:
(?:;|^)
- a non-capturing group matching;
or start of string([^=] )
- Group 1: one or more chars other than=
=
- a=
char([^;] )
- Group 2: one or more chars other than;
.
See the Python demo:
import re
text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( re.findall(r'(?:;|^)([^=] )=([^;] )', text) )
# => [('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]
A non-regex solution is also possible:
text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( [x.split('=') for x in text.split(';')] )
# => [['AF_ESP', '0.00546'], ['AF_EXAC', '0.00165'], ['AF_TGP', '0.00619']]
See this Python demo.
CodePudding user response:
This can be also solved with a split
method:
text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print([tuple(i.split('=')) for i in text.split(';')])
output:
[('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]