Home > front end >  How to check if a specific class is present using BeautifulSoup?
How to check if a specific class is present using BeautifulSoup?

Time:03-29

HTML

<div class='abc'>
  <ul>
    <li class='bg-dark contentPrice'>
      Some text
    </li>
    <li class ='bg-dark contentDate'>
       Some text
    </li>
  </ul>
</div>
...
<div class='abc'>
  <ul>
    <li class='bg-dark contentPrice'>
      Some text
    </li>
    <li class ='bg-dark contentRelease'>
       Some text
    </li>
  </ul>
</div>

Python

def getPrice():
    narr = soup.findAll('li', class_='contentPrice')
    return narr

def getDate():
    date = soup.findAll('li', class_='contentDate')
    return date

Problem: I want to get the text if the class contentDate is present and return None if contentRelease is present. I'm unsure how to achieve this!

Problem I'm facing: In my code, I'm using a loop to assign individual price and date inside a dictionary. For the <div> which doesn't have a contentDate it doesn't return a value and hence returns an IndexError.

Any guidance or help would be much appreciated. If you need any more info please let me know!

Note: If it can be easily done with any other library, that would be okay too!

CodePudding user response:

Just to point in a direction - Use a leaner way to process your data and a more structured approach to store your data:

for r in soup.find_all('div', {'class':'abc'}):
    data.append({
        'contentPrice': e.text.strip() if(e := r.find('li', {'class':'contentPrice'})) else None,
        'contentDate': e.text.strip() if(e := r.find('li', {'class':'contentDate'})) else None
    })

Note: In new code, please use find_all() instead of old syntax findAll()

Example

from bs4 import BeautifulSoup

html = '''
<div >
  <ul>
    <li >
      Some text
    </li>
    <li class ="bg-dark contentDate">
       Some text
    </li>
  </ul>
</div>
...
<div >
  <ul>
    <li >
      Some text
    </li>
    <li class ="bg-dark contentRelease">
       Some text
    </li>
  </ul>
</div>
'''

soup = BeautifulSoup(html)

data = []

for r in soup.find_all('div', {'class':'abc'}):
    data.append({
        'contentPrice': e.text.strip() if(e := r.find('li', {'class':'contentPrice'})) else None,
        'contentDate': e.text.strip() if(e := r.find('li', {'class':'contentDate'})) else None
    })

data

Output

[{'contentPrice': 'Some text', 'contentDate': 'Some text'},{'contentPrice': 'Some text', 'contentDate': None}]

You can also simply create a DataFrame from here:

pd.DataFrame(data)

CodePudding user response:

You can do that appyling try except

from bs4 import BeautifulSoup

txt='''
<div class='abc'>
  <ul>
    <li class='bg-dark contentPrice'>
      Some text
    </li>
    <li class ='bg-dark contentDate'>
       Some text Date
    </li>
  </ul>
</div>
...
<div class='abc'>
  <ul>
    <li class='bg-dark contentPrice'>
      Some text
    </li>
    <li class ='bg-dark contentRelease'>
       Some text Release
    </li>
  </ul>
</div>
'''
soup=BeautifulSoup(txt,'html.parser')

for item in soup.select('div.abc > ul'):
    
    try:
        date=item.select_one('.contentDate').text
        print(date)
    except:
        pass


    try:
        release=item.select_one('.contentRelease').text
        print(release)
    except:
        pass

Output:

Some text Date

Some text Release

  • Related