HTML
<div class='abc'>
<ul>
<li class='bg-dark contentPrice'>
Some text
</li>
<li class ='bg-dark contentDate'>
Some text
</li>
</ul>
</div>
...
<div class='abc'>
<ul>
<li class='bg-dark contentPrice'>
Some text
</li>
<li class ='bg-dark contentRelease'>
Some text
</li>
</ul>
</div>
Python
def getPrice():
narr = soup.findAll('li', class_='contentPrice')
return narr
def getDate():
date = soup.findAll('li', class_='contentDate')
return date
Problem: I want to get the text if the class contentDate
is present and return None
if contentRelease
is present. I'm unsure how to achieve this!
Problem I'm facing: In my code, I'm using a loop to assign individual price and date inside a dictionary. For the <div>
which doesn't have a contentDate
it doesn't return a value and hence returns an IndexError
.
Any guidance or help would be much appreciated. If you need any more info please let me know!
Note: If it can be easily done with any other library, that would be okay too!
CodePudding user response:
Just to point in a direction - Use a leaner way to process your data and a more structured approach to store your data:
for r in soup.find_all('div', {'class':'abc'}):
data.append({
'contentPrice': e.text.strip() if(e := r.find('li', {'class':'contentPrice'})) else None,
'contentDate': e.text.strip() if(e := r.find('li', {'class':'contentDate'})) else None
})
Note: In new code, please use find_all()
instead of old syntax findAll()
Example
from bs4 import BeautifulSoup
html = '''
<div >
<ul>
<li >
Some text
</li>
<li class ="bg-dark contentDate">
Some text
</li>
</ul>
</div>
...
<div >
<ul>
<li >
Some text
</li>
<li class ="bg-dark contentRelease">
Some text
</li>
</ul>
</div>
'''
soup = BeautifulSoup(html)
data = []
for r in soup.find_all('div', {'class':'abc'}):
data.append({
'contentPrice': e.text.strip() if(e := r.find('li', {'class':'contentPrice'})) else None,
'contentDate': e.text.strip() if(e := r.find('li', {'class':'contentDate'})) else None
})
data
Output
[{'contentPrice': 'Some text', 'contentDate': 'Some text'},{'contentPrice': 'Some text', 'contentDate': None}]
You can also simply create a DataFrame
from here:
pd.DataFrame(data)
CodePudding user response:
You can do that appyling try except
from bs4 import BeautifulSoup
txt='''
<div class='abc'>
<ul>
<li class='bg-dark contentPrice'>
Some text
</li>
<li class ='bg-dark contentDate'>
Some text Date
</li>
</ul>
</div>
...
<div class='abc'>
<ul>
<li class='bg-dark contentPrice'>
Some text
</li>
<li class ='bg-dark contentRelease'>
Some text Release
</li>
</ul>
</div>
'''
soup=BeautifulSoup(txt,'html.parser')
for item in soup.select('div.abc > ul'):
try:
date=item.select_one('.contentDate').text
print(date)
except:
pass
try:
release=item.select_one('.contentRelease').text
print(release)
except:
pass
Output:
Some text Date
Some text Release