Home > front end >  how scrape - span aria-hidden="true" - text
how scrape - span aria-hidden="true" - text

Time:10-26

I am trying to web scrape using selenium and beautiful soupe but i cannot get selenium to find the element I need and return the text.

here is the html:

<span >
            <span aria-hidden="true"><!---->Crédit Agricole CIB · Full-time<!----></span><span ><!---->Crédit Agricole CIB · Full-time<!----></span>
          </span>

Do you know how to get the text 'Crédit Agricole CIB Full-time' from this html?

I am trying to do something like this:

src = driver.page_source
soup = BeautifulSoup(src, 'lxml')                                    # Now using beautiful soup
intro = soup.find('div', {'class': 'pv-text-details__left-panel'})

text_loc = intro.find( ???? )                                        # Extracting the text
text = text_loc.get_text().strip()                                   # Removing extra blank space

I do not know what to put in the ????

CodePudding user response:

I can't confirm without knowing exactly what the full HTML looks like - there might be other very similarly nested elements before the snippet shared in the question, but if there aren't then you can use soup.select_one with the css selectors used below:

spanTxt1 = soup.select_one('span.t-14.t-normal span[aria-hidden="true"]')
if spanTxt1 is not None: spanTxt1 = spanTxt1.get_text(strip=True)

spanTxt2 = soup.select_one('span.t-14.t-normal span.visually-hidden')
if spanTxt2 is not None: spanTxt2 = spanTxt2.get_text(strip=True)

print(f' Text1: "{spanTxt1}" \n Text2: "{spanTxt2}" ')

should give the output

 Text1: "Crédit Agricole CIB · Full-time" 
 Text2: "Crédit Agricole CIB · Full-time" 
  • Related