import requests
from bs4 import BeautifulSoup
result = requests.get(f"https://www.indeed.com/jobs?q=web development&start=0")
source = result.content
soup = BeautifulSoup(source, "lxml")
job_posted = soup.find("div", {"id": "searchCountPages"}).text.strip()
job_posted = job_posted[10:-5].replace(",", "")
job_posted = int(job_posted)
print(job_posted)
I tried to convert a string into integer after scraping it from a website, when i run the program sometimes it work and some other times it doesn't! i get this error:ValueError: invalid literal for int() with base 10: 's | Page 1 of '
this is in yellow what i was trying to scrap
CodePudding user response:
As mentioned, regex is appropriate here:
import re
p = re.compile(r"Page (\d*) of (\d*) jobs")
job_posted = soup.find("div", {"id": "searchCountPages"}).text.strip().replace(",", "")
page_num, page_count = map(int, p.match(job_posted).groups())
Note that this will error if that exact pattern isn't found.
Output:
In [3]: page_num, page_count = map(int, p.match(job_posted).groups())
In [4]: page_num
Out[4]: 1
In [5]: page_count
Out[5]: 96575