Home > Blockchain >  Regex for aviary values in python
Regex for aviary values in python

Time:08-24

I have the following string sample (in a pandas column):

mystring = "[{'alphitobius_factor': -13.67912, 'createdAt': '2022-06-28 04:37:36', 'salmonella_factor': 72.5203}, {'createdAt': '2022-06-28 04:37:47', 'alphitobius_factor': 15.0000, 'salmonella_factor': -14.67221}, {'salmonella_factor': -36.982361, 'createdAt': '2022-07-17 11:02:39.297637', 'id': None, 'alphitobius_factor': -12.04351}]"

I want to capture all alphitobius numbers so I tried this:

import re
re.findall(r"bius_factor:\s(-?\d.\d ),")

But it returns me []. Please, could you guide me in this regex expression? I guess the digit dot digit is ok, but perhaps the bius_factor sentence is where I'm wrong.

CodePudding user response:

Your pattern was close, but you missed the '. I also tweaked the capture group a bit:

import re

my_string = "[{'alphitobius_factor': -13.67912, 'createdAt': '2022-06-28 04:37:36', 'salmonella_factor': 72.5203}, {'createdAt': '2022-06-28 04:37:47', 'alphitobius_factor': 15.0000, 'salmonella_factor': -14.67221}, {'salmonella_factor': -36.982361, 'createdAt': '2022-07-17 11:02:39.297637', 'id': None, 'alphitobius_factor': -12.04351}]"

print(re.findall("bius_factor':\s([-\d.] )", my_string))
# ['-13.67912', '15.0000', '-12.04351']

In your comment on the question you included an extra ' at the beginning of the pattern.

CodePudding user response:

If you need regex, one way to do is to use positive look behind for alphitobius_factor like:

re.findall(r"(?<='alphitobius_factor')\s*\:\s*(\-?\d \.*\d*)", mystring)

This regex will match all values with/without decimals, positive and negative but for only those which have "alphitobius_factor" as the key.

which gives:

['-13.67912', '15.0000', '-12.04351']

But I think its much easier to use ast.literal_eval?

[v for d in ast.literal_eval(mystring) for k, v in d.items() if k=='alphitobius_factor']

output:

[-13.67912, 15.0, -12.04351]

CodePudding user response:

Comparing both regex

r"bius_factor:\s(-?\d.\d ),"      #your code
r"bius_factor':\s(-*\d \.\d ),"    #my code

you are missing ',

and for minus sign to appear zero/more -*,

and for digit to appear once/more \d

and decimal should have escape character \.

I have tried with this regex, and able to get the string of numbers

lis = re.findall(r"bius_factor':\s(-*\d \.\d ),", mystring)
print(lis)

['-13.67912', '15.0000', '-12.04351']

After that if you need them as numbers, you can do list comprehension

[float(x) for x in lis]

[-13.67912, 15.0, -12.04351]
  • Related