Home > Net >  How can I solve Keyerror( return self.attrs[key]) to extract data on Python?
How can I solve Keyerror( return self.attrs[key]) to extract data on Python?

Time:09-22

I am trying to make web Scraper with Python and there is a problem in extracting title of company.

def extract_indeed_job():
jobs = []
result = requests.get(f"{url}&start={0*LIMIT}")
result_soup = BeautifulSoup(result.text, "html.parser")
results = result_soup.find_all("a", {"class": "tapItem"})
for result in results:
    title = result.find("h2", {"class": "jobTitle"}).find("span")["title"]
    company = result.find("span", {"class": "companyName"}).get_text()
    location = result.find("div", {"class": "companyLocation"}).get_text()
    print(title, company, location)

Some of posts, there are two span tags in the h2 class="jobTitle" tag enter image description here

And I need to get just span title. So I wrote in with this tag. But, Python notices the key error and it doesn't work.

What can I do to solve? Is there any problem in my code??

CodePudding user response:

Note that there are multiple <span>s inside <h2> element. You want <span> which is immediate child of <h2> rather than <span> inside <div>, to get it you might replace

result.find("h2", {"class": "jobTitle"}).find("span")

using

result.find("h2", {"class": "jobTitle"}).find("span", recursive=False)

This will prevent recursive search (i.e. looking for children of children and further)

CodePudding user response:

the True ensure that you are filtering those span with that attribute so when you try to access to its value you don't get an error. The find just returns a span careless of the attributes that you need.

result.find("span", title=True)['title']

The code and html you provided are ambiguos. Your statement title = result.find("h2", {"class": "jobTitle"}) will never match the h2 tag because its class attribute is more complex, ``jobTitle jobTitle-color-purple jobTitle-newJob`. To match that you need

import re
...

result.find("h2", class_=re.compile(r'jobTitle'))

Use regular expression to improve the search in the soup.

  • Related