for link in soup.findAll('li'):
if "c-listing__authors-list" in str(link):
# theAuthor = link.string
theAuthor = str(link).replace("</p>","")
theAuthor = theAuthor.split("</span>")[1]
listAuthor.append(theAuthor)[Output][1]
CodePudding user response:
Try to use get_text(strip=True)
to get your goal:
for e in soup.select('li span.c-listing__authors-list'):
theAuthor = e.get_text(strip=True)
or to get a list in one line:
theAuthor = [e.get_text(strip=True) for e in soup.select('li span.c-listing__authors-list')]
Example
from bs4 import BeautifulSoup
html='''
<ul>
<li><span >a</span></li>
<li><span >b</span></li>
<li><span>no list</span></li>
</ul>
'''
soup = BeautifulSoup(html)
theAuthor = []
for e in soup.select('li span.c-listing__authors-list'):
theAuthor.append(e.get_text(strip=True))
Output
['a', 'b']
CodePudding user response:
This answer is Microsoft (.Net) centric but I'm hoping it may help point you in the right direction.
Its been a while since I've created a scraper. But I'm thinking this is possible if you also know your XPath as I recall being able to read a webpage into a HTMLDocument, accessing the element you require using XPath then obtaining the text value of it.