I want to scrape data from a website. (which generate cards using javascript I tried this and i am using chrome driver just for information)
Title = []
driver.get(url)
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
for a in soup.findAll('div',attrs={'class':'card-body text-center'}):
title = a.find('h4',attrs={'class':'card-title'})
Title.append(title.text)
print(Title)
I am getting my list Title empty Website Code looks like below from which i try to scrap data
<div >
<h4 >
Title</h4>
<h7>Risen_star</h7>
<br>
<h7>2022</h7>
<p style="height:57px" >
hi
</p>
<a href="/details" >
Read More
</a>
</div>
CodePudding user response:
It looks like things are working fine:
from bs4 import BeautifulSoup
Title = []
content = """<div >
<h4 >
Title</h4>
<h7>Risen_star</h7>
<br>
<h7>2022</h7>
<p style="height:57px" >
hi
</p>
<a href="/details" >
Read More
</a>
</div>
"""
soup = BeautifulSoup(content,'html.parser')
for a in soup.findAll('div',attrs={'class':'card-body text-center'}):
title = a.find('h4',attrs={'class':'card-title'})
Title.append(title.text)
print(Title)
This results in:
$ python3 test.py
['\nTitle']
May be you content
variable is not correctly populated with the HTML page content ..
CodePudding user response:
data = []
soup = BeautifulSoup(content,'html.parser')
for a in soup.findAll('div',attrs={'class':'card-body text-center'}):
element = {}
title = a.find('h4',attrs={'class':'card-title'})
sub_title = a.find('h7')
element['title'] = title.text
element['sub_title'] = sub_title.text
data.append([element])
print(data)
Something like the above to add elements into a list and use .text
[[{'title': 'Title', 'sub_title': 'Risen_star'}]]