Home > Software engineering >  scraping a string and convert it into integer
scraping a string and convert it into integer

Time:06-12

import requests
from bs4 import BeautifulSoup

result = requests.get(f"https://www.indeed.com/jobs?q=web development&start=0")
source = result.content
soup = BeautifulSoup(source, "lxml")

job_posted = soup.find("div", {"id": "searchCountPages"}).text.strip()
job_posted = job_posted[10:-5].replace(",", "")
job_posted = int(job_posted)
print(job_posted)

I tried to convert a string into integer after scraping it from a website, when i run the program sometimes it work and some other times it doesn't! i get this error:ValueError: invalid literal for int() with base 10: 's | Page 1 of ' enter image description here

this is in yellow what i was trying to scrap

CodePudding user response:

As mentioned, regex is appropriate here:

import re


p = re.compile(r"Page (\d*) of (\d*) jobs")


job_posted = soup.find("div", {"id": "searchCountPages"}).text.strip().replace(",", "")
page_num, page_count = map(int, p.match(job_posted).groups())

Note that this will error if that exact pattern isn't found.

Output:

In [3]: page_num, page_count = map(int, p.match(job_posted).groups())

In [4]: page_num
Out[4]: 1

In [5]: page_count
Out[5]: 96575
  • Related