Hello there im gettin really frustruated i think i have tried it all, read crummy, read documentation on Beautifulsoup4 website. I can´t get this thing wrapped around my head.
So to the question:
<a href="/aktier/om-aktien.html/5246/investor-a">
<span ></span>Investor A</a>
<a href="/aktier/om-aktien.html/5247/investor-b">
<span ></span>Investor B</a>
i only want the text behind /span> "text"
this is the code:
def scrape(self):
self.get(const.StockPicks)
html = self.page_source
soup = BeautifulSoup(html, "lxml")
StockPage = soup.find_all("div", class_="orderbookListWrapper")
StockNameBook = []
for StockPages in StockPage:
StockName = StockPages.find_all("a", class_="ellipsis")
StockNameBook.append(StockName)
print (StockNameBook)
I have tried for to long im lost in myself. can you please help me out abit?
Best regards,
A
CodePudding user response:
Your question is not very clear, anyway I guess you want to get the text inside the a
tag.
the third option may result in an error if there is no span
tag inside an a
tag.
StockNameBook.append(list(map(lambda x:x.text.strip(), StockName)))
# StockNameBook.append(list(map(lambda x:x.get("href"), StockName)))
# StockNameBook.append(list(map(lambda x:x.span.text, StockName)))
CodePudding user response:
text behind /span> "text"
are text nodes of a tag
. So You have to select [a ] then you can call .get_text()
method to get text nodes value as text/string
html='''
<html>
<body>
<a href="/aktier/om-aktien.html/5246/investor-a">
<span >
</span>
Investor A
</a>
<a href="/aktier/om-aktien.html/5247/investor-b">
<span >
</span>
Investor B
</a>
</body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html,'lxml')
#print(soup.prettify())
for span in soup.find_all('a',class_="ellipsis"):
txt = span.get_text(strip=True)
print(txt)
Output:
Investor A
Investor B