How can I solve Keyerror( return self.attrs[key]) to extract data on Python?-CodePudding

I am trying to make web Scraper with Python and there is a problem in extracting title of company.

def extract_indeed_job():
jobs = []
result = requests.get(f"{url}&start={0*LIMIT}")
result_soup = BeautifulSoup(result.text, "html.parser")
results = result_soup.find_all("a", {"class": "tapItem"})
for result in results:
    title = result.find("h2", {"class": "jobTitle"}).find("span")["title"]
    company = result.find("span", {"class": "companyName"}).get_text()
    location = result.find("div", {"class": "companyLocation"}).get_text()
    print(title, company, location)

Some of posts, there are two span tags in the h2 class="jobTitle" tag

And I need to get just span title. So I wrote in with this tag. But, Python notices the key error and it doesn't work.

What can I do to solve? Is there any problem in my code??

CodePudding user response：

Note that there are multiple <span>s inside <h2> element. You want <span> which is immediate child of <h2> rather than <span> inside <div>, to get it you might replace

result.find("h2", {"class": "jobTitle"}).find("span")

using

result.find("h2", {"class": "jobTitle"}).find("span", recursive=False)

This will prevent recursive search (i.e. looking for children of children and further)

CodePudding user response：

the True ensure that you are filtering those span with that attribute so when you try to access to its value you don't get an error. The find just returns a span careless of the attributes that you need.

result.find("span", title=True)['title']

The code and html you provided are ambiguos. Your statement title = result.find("h2", {"class": "jobTitle"}) will never match the h2 tag because its class attribute is more complex, ``jobTitle jobTitle-color-purple jobTitle-newJob`. To match that you need

import re
...

result.find("h2", class_=re.compile(r'jobTitle'))

Use regular expression to improve the search in the soup.