Home > Blockchain >  Scrape list items into comma separated values
Scrape list items into comma separated values

Time:10-06

I am new to web scraping and trying to scrape an unordered list. I want the list items to be scraped into a list items separated with a comma. This is the list I want to scrape.

<div class="spec-list attributes-modality">
<h5 class="spec-subcat">Modality</h5>
<div class="col-split-xs-1 col-split-md-1">
<ul class="attribute-list copy-small">
<li class="">Individuals</li>
<li class="">Family</li>
<li class="">Group</li>
</ul></div></div>

This is my try:

modalitydiv = soup.find('div', class_='spec-list attributes-modality')
modality = modalitydiv.find('ul', class_='attribute-list copy-small').text.strip()

My try does give me only a list of list items line by line.

Individuals

Family

Group

Why do these does not appear in the same line and how can I scrape this list items into a comma separated list? can someone help please?

CodePudding user response:

What happens

You are using strip() that is only removing leading and trailing characters.

Solution

Use split() instead strip() to split string into a list:

modalitydiv = soup.find('div', class_='spec-list attributes-modality')
modality = modalitydiv.find('ul', class_='attribute-list copy-small').text.split()

Output

['Individuals', 'Family', 'Group']

CodePudding user response:

I'm getting the following output:

from bs4 import BeautifulSoup


html_doc="""

<div class="spec-list attributes-modality">
<h5 class="spec-subcat">Modality</h5>
<div class="col-split-xs-1 col-split-md-1">
<ul class="attribute-list copy-small">
<li class="">Individuals</li>
<li class="">Family</li>
<li class="">Group</li>
</ul></div></div>

"""

soup = BeautifulSoup(html_doc, 'html.parser')

p=', '.join([x.get_text(strip = True) for x in soup.select('ul.attribute-list.copy-small>li')])

print(p)

Output:

Individuals, Family, Group
  • Related