I am trying to parse lines in a file and in particular, extract any numbers that have decimals as these would indicate that they are dollar values. I have got the following so far:
sample_text = " JAN 01 19 SOME OTHER STRING .25 1.56 12,345.67"
print(re.findall("\d \.\d ", re.sub(",", "", sample_text))) # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))
The output of the above is:
['1.56', '12345.67']
2
So the ".25" is being ignored, I think because it doesn't have a leading zero. When I add a leading zero, it seems to work, but the trouble is, the file I am reading from is quite large and there are quite a few files and I don't want to have to add a leading zero to all non-leading zero decimals in all files:
sample_text = " JAN 01 19 SOME OTHER STRING 0.25 1.56 12,345.67"
print(re.findall("\d \.\d ", re.sub(",", "", sample_text))) # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))
Output:
['0.25', '1.56', '12345.67']
3
I did try the following to add a leading zero to the decimals without a leading zero but it didn't give me what I wanted:
sample_text = re.sub(",", "", sample_text)
print(sample_text)
sample_text = re.sub(" .", "0.", sample_text)
print(sample_text)
print(re.findall("\d \.\d ", re.sub(",", "", sample_text))) # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))
Output:
JAN 01 19 SOME OTHER STRING .25 1.56 12345.67
0.AN0.10.90.OME0.THER0.TRING0.0.0.0.0.250.0.1.560.0.0.0.0.0.2345.67
['0.10', '0.0', '0.0', '0.250', '0.1', '560.0', '0.0', '0.0', '2345.67']
9
CodePudding user response:
Instead of
before the dot, you could use *
in case a number doesn't exist.
out = re.findall("\d*\.\d ", re.sub(",", "", sample_text))
Output:
['.25', '1.56', '12345.67']