How to extract text from tag?-CodePudding

It is giving me output with html tag but i dont need html tag.

Getting the text is throwing AttributeError:

'NoneType' object has no attribute 'get_text'

import requests
from bs4 import BeautifulSoup

url = requests.get("https://in.indeed.com/jobs?q=python developer&l=")

soup = BeautifulSoup(url.content,"html.parser")

parsed_file = soup.find(id = "resultsBody")


items = parsed_file.find_all(class_="slider_container")
for item in items:
    job_title = item.find(title='Python Developer').get_text()
    print(job_title)

CodePudding user response：

Since you only want to print out the jobs whose title is Python Developer, you need to first check if a job with such a title exists - That is .find() should not return None.

Just put this check inside your for-loop.

job_title = item.find(title='Python Developer')    
# If job_title is not None, print the text
if job_title:
        print(job_title.get_text())

CodePudding user response：

.get_text() only works if there is a result with your selection for a title. To fix the process first check if result is not None:

for item in items:
    job_title = item.find(title='Python Developer').get_text() if item.find(title='Python Developer') else 'no result'
    print(job_title)

Hint

Your selection could be more focused, so your are able to loop more efficient over the cards and also scrap additional info:

soup.select('#mosaic-provider-jobcards > a')

Example

import requests
from bs4 import BeautifulSoup

url = requests.get("https://in.indeed.com/jobs?q=python developer&l=")
soup = BeautifulSoup(url.content,"html.parser")

data = []

for item in soup.select('#mosaic-provider-jobcards > a'):
    if item.find(title='Python Developer'):
        data.append({
            'title':item.h2.get_text(),
            'company':item.a.get_text(),
            '...':'...'
        })
data