Extract All Numbers with Decimals from String-CodePudding

I am trying to parse lines in a file and in particular, extract any numbers that have decimals as these would indicate that they are dollar values. I have got the following so far:

sample_text = " JAN 01 19 SOME OTHER STRING         .25    1.56           12,345.67"
print(re.findall("\d \.\d ", re.sub(",", "", sample_text)))  # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))

The output of the above is:

['1.56', '12345.67']
2

So the ".25" is being ignored, I think because it doesn't have a leading zero. When I add a leading zero, it seems to work, but the trouble is, the file I am reading from is quite large and there are quite a few files and I don't want to have to add a leading zero to all non-leading zero decimals in all files:

sample_text = " JAN 01 19 SOME OTHER STRING         0.25    1.56           12,345.67"
print(re.findall("\d \.\d ", re.sub(",", "", sample_text)))  # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))

Output:

['0.25', '1.56', '12345.67']
3

I did try the following to add a leading zero to the decimals without a leading zero but it didn't give me what I wanted:

sample_text = re.sub(",", "", sample_text)
print(sample_text)
sample_text = re.sub(" .", "0.", sample_text)
print(sample_text)
print(re.findall("\d \.\d ", re.sub(",", "", sample_text)))  # Find numbers with decimals in them
print(len(re.findall("\d \.\d ", re.sub(",", "", sample_text))))

Output:

 JAN 01 19 SOME OTHER STRING         .25    1.56           12345.67
0.AN0.10.90.OME0.THER0.TRING0.0.0.0.0.250.0.1.560.0.0.0.0.0.2345.67
['0.10', '0.0', '0.0', '0.250', '0.1', '560.0', '0.0', '0.0', '2345.67']
9

CodePudding user response：

Instead of before the dot, you could use * in case a number doesn't exist.

out = re.findall("\d*\.\d ", re.sub(",", "", sample_text))

Output:

['.25', '1.56', '12345.67']