Home > Blockchain >  Scraping issue using BeautifulSoup and Selenium
Scraping issue using BeautifulSoup and Selenium

Time:05-24

I am starting coding for myself and I am blocked on a code line. Can you provide me some explications ?

I want to scrape informations from this div tag :

role = experience1_div('span', {'class' : 'mr1 t-bold'}) print(role)

Output :

[<span > <span aria-hidden="true"><!-- -->Automation Engineer - Intern<!-- --></span><span ><!-- -->Automation Engineer - Intern<!-- --></span> </span>]

How can I get only the HTML text : "Automation Engineer - Intern"

I tried this function .get_text().strip() but it seems that the span tag is blocking my function....

CodePudding user response:

I don't know what experience1_div is but to get all text use role.text

role = experience1_div.find('span', {'class' : 'mr1 t-bold'}) 
print(role.text)

output: Automation Engineer - InternAutomation Engineer - Intern

To get text from the first nested span, use role.span.text

or from the second nested span role.contents[2].text

CodePudding user response:

Main issue in provided information is that you have generated a ResultSet - To get its text you have to pick the element directly or iterate it.

role[0].span.get_text(strip=True)

or

for e in role:
    print(e.span.get_text(strip=True))

Output:

Automation Engineer - Intern

Better approach would be to select your element more specific (based on your example):

experience1_div.select_one('span.mr1.t-bold > span').get_text(strip=True)
  • Related