Home > Blockchain >  How do I return all strings containing a "£" in them?
How do I return all strings containing a "£" in them?

Time:10-20

I'm trying to web scrape a site that is badly designed and I am trying to gather the prices of items. The only thing in common with each page is that the prices all start with a "£" so I thought that if I searched through all the HTML content and returned all strings with "£" attached it would work.

I am not quite sure how to go about this. Any help is greatly appreciated.

Kind regards

CodePudding user response:

If you just want to pull out the prices with '£' prefix then can try something like this.

import re

html = """
cost of living is £2,232
bottle of milk costs £1 and it goes up to £1.05 a year later...
"""

print(re.findall(r"£\S ", html))

Output:

['£2,232', '£1', '£1.05']

If you want to extract the item name along with the price then the regexp will need to be modified. BeautifulSoup Python library can be used to extract info from even malformed HTML sites.

  • Related