Home > Mobile >  Beautifulsoup need help, Can´t parse out text that is behind </span>text</a>
Beautifulsoup need help, Can´t parse out text that is behind </span>text</a>

Time:05-07

Hello there im gettin really frustruated i think i have tried it all, read crummy, read documentation on Beautifulsoup4 website. I can´t get this thing wrapped around my head.

So to the question:

<a  href="/aktier/om-aktien.html/5246/investor-a">
<span ></span>Investor A</a> 
<a  href="/aktier/om-aktien.html/5247/investor-b">
<span ></span>Investor B</a>

i only want the text behind /span> "text"

this is the code:

def scrape(self):
    self.get(const.StockPicks)
    html = self.page_source
    soup = BeautifulSoup(html, "lxml")
    StockPage = soup.find_all("div", class_="orderbookListWrapper")
    StockNameBook = []
    for StockPages in StockPage:

        StockName = StockPages.find_all("a", class_="ellipsis")
        StockNameBook.append(StockName)
        print (StockNameBook)

I have tried for to long im lost in myself. can you please help me out abit?

Best regards,

A

CodePudding user response:

Your question is not very clear, anyway I guess you want to get the text inside the a tag.

the third option may result in an error if there is no span tag inside an a tag.

StockNameBook.append(list(map(lambda x:x.text.strip(), StockName)))
# StockNameBook.append(list(map(lambda x:x.get("href"), StockName)))
# StockNameBook.append(list(map(lambda x:x.span.text, StockName)))

CodePudding user response:

text behind /span> "text" are text nodes of a tag . So You have to select [a ] then you can call .get_text() method to get text nodes value as text/string

html='''
<html>
 <body>
  <a  href="/aktier/om-aktien.html/5246/investor-a">
   <span >
   </span>
   Investor A
  </a>
  <a  href="/aktier/om-aktien.html/5247/investor-b">
   <span >
   </span>
   Investor B
  </a>
 </body>
</html>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'lxml')

#print(soup.prettify())

for span in soup.find_all('a',class_="ellipsis"):
    txt = span.get_text(strip=True)
    print(txt)

Output:

Investor A
Investor B
  • Related