I am new to web scraping and trying to scrape an unordered list. I want the list items to be scraped into a list items separated with a comma. This is the list I want to scrape.
<div class="spec-list attributes-modality">
<h5 class="spec-subcat">Modality</h5>
<div class="col-split-xs-1 col-split-md-1">
<ul class="attribute-list copy-small">
<li class="">Individuals</li>
<li class="">Family</li>
<li class="">Group</li>
</ul></div></div>
This is my try:
modalitydiv = soup.find('div', class_='spec-list attributes-modality')
modality = modalitydiv.find('ul', class_='attribute-list copy-small').text.strip()
My try does give me only a list of list items line by line.
Individuals
Family
Group
Why do these does not appear in the same line and how can I scrape this list items into a comma separated list? can someone help please?
CodePudding user response:
What happens
You are using strip()
that is only removing leading and trailing characters.
Solution
Use split()
instead strip()
to split string into a list:
modalitydiv = soup.find('div', class_='spec-list attributes-modality')
modality = modalitydiv.find('ul', class_='attribute-list copy-small').text.split()
Output
['Individuals', 'Family', 'Group']
CodePudding user response:
I'm getting the following output:
from bs4 import BeautifulSoup
html_doc="""
<div class="spec-list attributes-modality">
<h5 class="spec-subcat">Modality</h5>
<div class="col-split-xs-1 col-split-md-1">
<ul class="attribute-list copy-small">
<li class="">Individuals</li>
<li class="">Family</li>
<li class="">Group</li>
</ul></div></div>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
p=', '.join([x.get_text(strip = True) for x in soup.select('ul.attribute-list.copy-small>li')])
print(p)
Output:
Individuals, Family, Group