Regex to Ignore Semicolon-CodePudding

I have one column in a dataframe with key value pairs I would like to extract.

AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619

I would like to parse key value pairs like so

('AF_ESP', '0.00546')
('AF_EXAC', '0.00165')
('AF_TGP', '0.00619')

Here is my regex.

([^=] )=([^;] )

This gets me most of way there:

('AF_ESP', '0.00546')
(';AF_EXAC', '0.00165')
(';AF_TGP', '0.00619')

How can I adjust it so semicolons are not captured in the result?

CodePudding user response：

You can consume the semi-colon or start of string in front:

(?:;|^)([^=] )=([^;] )

See the regex demo. Details:

(?:;|^) - a non-capturing group matching ; or start of string
([^=] ) - Group 1: one or more chars other than =
= - a = char
([^;] ) - Group 2: one or more chars other than ;.

See the Python demo:

import re
text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( re.findall(r'(?:;|^)([^=] )=([^;] )', text) )
# => [('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]

A non-regex solution is also possible:

text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( [x.split('=') for x in text.split(';')] )
# => [['AF_ESP', '0.00546'], ['AF_EXAC', '0.00165'], ['AF_TGP', '0.00619']]

See this Python demo.

CodePudding user response：

This can be also solved with a split method:

text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print([tuple(i.split('=')) for i in text.split(';')])

output:

[('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]