Home > Mobile >  extracting number with decimal points from text extracted from pdf files
extracting number with decimal points from text extracted from pdf files

Time:03-01

I need to extract only numbers with a decimal point from the following string. I used re module but faced a problem with a number of commas(there can be no commas or more than 1). Another problem is decimal numbers followed by words (i.e. 1,513,971.63Savings ). As I extracted the string from PDF files so I can't change the format.

sample string:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

output:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

Anyone help?

CodePudding user response:

I guess you missed the 174,381.98. If so, use (\d (?:[,.]\d ) ) pattern to fetch the expected numbers.

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d (?:[,.]\d ) )", string), sep="\n")
  • Related