Home > OS >  Split digit using regex python
Split digit using regex python

Time:02-21

I am trying to webscrape data from https://www.mygov.in/covid-19, but when I extract the digits, there raises a new problem. image preview. The number indicate current value and value of how much it changed. eg: 3,81,74,366⬆54,229.

When I scrape I get the text as 3,81,74,36654,229. So how can I get the current value only?

eg:
3,81,74,36654,229 to 3,81,74,366
10,79,894198 to 10,79,894
22,40,7200 to 22,40,720

How to do this? Please help

CodePudding user response:

Here's an extract of an HTML fragment from that page:

<p >8,43,56,092
  <span >39,477</span>
</p>

If you get the text for the p element, the return value will be merged with the span content.

Consider doing this:

for p in soup.select('p.mid-wrap'):
    span = p.find('span')
    if span:
        spantext = span.getText()
        print(spantext)
        span.extract()
    print(p.getText())

Output:

39,477
8,43,56,092

CodePudding user response:

Assuming all numbers are bigger than one thousand and current value is the first thing in the string something like this should work

^.*?,\d{3}
  • Related