Home > Enterprise >  Regex to Ignore Semicolon
Regex to Ignore Semicolon

Time:04-17

I have one column in a dataframe with key value pairs I would like to extract.

AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619

I would like to parse key value pairs like so

('AF_ESP', '0.00546')
('AF_EXAC', '0.00165')
('AF_TGP', '0.00619')

Here is my regex.

([^=] )=([^;] )

This gets me most of way there:

('AF_ESP', '0.00546')
(';AF_EXAC', '0.00165')
(';AF_TGP', '0.00619')

How can I adjust it so semicolons are not captured in the result?

CodePudding user response:

You can consume the semi-colon or start of string in front:

(?:;|^)([^=] )=([^;] )

See the regex demo. Details:

  • (?:;|^) - a non-capturing group matching ; or start of string
  • ([^=] ) - Group 1: one or more chars other than =
  • = - a = char
  • ([^;] ) - Group 2: one or more chars other than ;.

See the Python demo:

import re
text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( re.findall(r'(?:;|^)([^=] )=([^;] )', text) )
# => [('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]

A non-regex solution is also possible:

text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print( [x.split('=') for x in text.split(';')] )
# => [['AF_ESP', '0.00546'], ['AF_EXAC', '0.00165'], ['AF_TGP', '0.00619']]

See this Python demo.

CodePudding user response:

This can be also solved with a split method:

text = "AF_ESP=0.00546;AF_EXAC=0.00165;AF_TGP=0.00619"
print([tuple(i.split('=')) for i in text.split(';')])

output:

[('AF_ESP', '0.00546'), ('AF_EXAC', '0.00165'), ('AF_TGP', '0.00619')]
  • Related