Home > database >  HTML element is not available when scraping
HTML element is not available when scraping

Time:10-07

I am new to web scraping and have started a project. I am scraping a website containing many pages. In some pages, the targeted data is not present while some have. Hence I tried to solve it with if- else condition to look whether the element is present or not. But it seems not working for some elements. Can I get explained what the issue is and how to get rid of it?

What I want to scrape is:

<div class="spec-list attributes-sexuality">
<h5 class="spec-subcat">Sexuality</h5>
<div class="col-split-xs-1 col-split-md-2">
<ul class="attribute-list copy-small">
<li class="">Bisexual</li>
<li class="">Gay</li>
<li class="">Lesbian</li>
</ul></div></div>

This element is not available in some pages. This is my try:

if soup.find('div', class_='spec-list attributes-sexuality'):
      sexuality = ', '.join([x.get_text(strip = True) for x in soup.find('div', class_='spec-list attributes-sexuality').find('ul.attribute-list.copy-small>li')])
   else:
      sexuality = ''

Error I get :

'NoneType' object is not iterable

Why it returns None type when my loop must be ran if only the element is present? How can I overcome this?

CodePudding user response:

You should be doing like this.

s = """
<div class="spec-list attributes-sexuality">
<h5 class="spec-subcat">Sexuality</h5>
<div class="col-split-xs-1 col-split-md-2">
<ul class="attribute-list copy-small">
<li class="">Bisexual</li>
<li class="">Gay</li>
<li class="">Lesbian</li>
</ul></div></div>
"""

soup = bs.BeautifulSoup(s, 'lxml')
temp = soup.find('div', class_='spec-list attributes-sexuality') 
if temp:
    sexuality = ', '.join([x.get_text(strip = True) for x in temp.find('ul', class_='attribute-list copy-small').find_all('li')])
else:
    sexuality = ''
  • Related