I have the following string sample (in a pandas column):
mystring = "[{'alphitobius_factor': -13.67912, 'createdAt': '2022-06-28 04:37:36', 'salmonella_factor': 72.5203}, {'createdAt': '2022-06-28 04:37:47', 'alphitobius_factor': 15.0000, 'salmonella_factor': -14.67221}, {'salmonella_factor': -36.982361, 'createdAt': '2022-07-17 11:02:39.297637', 'id': None, 'alphitobius_factor': -12.04351}]"
I want to capture all alphitobius numbers so I tried this:
import re
re.findall(r"bius_factor:\s(-?\d.\d ),")
But it returns me []
. Please, could you guide me in this regex expression? I guess the digit dot digit is ok, but perhaps the bius_factor sentence is where I'm wrong.
CodePudding user response:
Your pattern was close, but you missed the '
. I also tweaked the capture group a bit:
import re
my_string = "[{'alphitobius_factor': -13.67912, 'createdAt': '2022-06-28 04:37:36', 'salmonella_factor': 72.5203}, {'createdAt': '2022-06-28 04:37:47', 'alphitobius_factor': 15.0000, 'salmonella_factor': -14.67221}, {'salmonella_factor': -36.982361, 'createdAt': '2022-07-17 11:02:39.297637', 'id': None, 'alphitobius_factor': -12.04351}]"
print(re.findall("bius_factor':\s([-\d.] )", my_string))
# ['-13.67912', '15.0000', '-12.04351']
In your comment on the question you included an extra '
at the beginning of the pattern.
CodePudding user response:
If you need regex, one way to do is to use positive look behind for alphitobius_factor
like:
re.findall(r"(?<='alphitobius_factor')\s*\:\s*(\-?\d \.*\d*)", mystring)
This regex will match all values with/without decimals, positive and negative but for only those which have "alphitobius_factor" as the key.
which gives:
['-13.67912', '15.0000', '-12.04351']
But I think its much easier to use ast.literal_eval
?
[v for d in ast.literal_eval(mystring) for k, v in d.items() if k=='alphitobius_factor']
output:
[-13.67912, 15.0, -12.04351]
CodePudding user response:
Comparing both regex
r"bius_factor:\s(-?\d.\d )," #your code
r"bius_factor':\s(-*\d \.\d )," #my code
you are missing '
,
and for minus sign to appear zero/more -*
,
and for digit to appear once/more \d
and decimal should have escape character \.
I have tried with this regex, and able to get the string of numbers
lis = re.findall(r"bius_factor':\s(-*\d \.\d ),", mystring)
print(lis)
['-13.67912', '15.0000', '-12.04351']
After that if you need them as numbers, you can do list comprehension
[float(x) for x in lis]
[-13.67912, 15.0, -12.04351]