I am new to web scraping and I am using BeautifulSoup to do that
My problem is that when I put the desired contents into the list from another list which includes some tags, the second list has some missing values.
Here is my list from which I am getting values
From this list I want to create a one which consists of the review so I use
names = []
for item in basic_info:
for i in item:
names.append(i.find_all("p", attrs = {"class" : "review-body"}))
The problem is that output looks like this
So basically instead of having values one by one I am getting them every other place in the list so first is empty, second has data, third is empty and then fourth has data and so on
CodePudding user response:
Note Providing all relevant information as text instead of image would be great.
Assuming you like to extract the several review informations, you should select all the containers and iterate over to scrape and store structured data:
data = []
for e in soup.select('div.consumer-review-container'):
data.append({
'review-title':e.h3.text,
'review-type':e.select_one('div.review-type').text,
'review-section':e.select_one('p.review-body').text
})
Example
sample = '''
<div >
<h3 >Excellent Car</h3>
<div >
<div>June 8, 2021</div>
<div>By ValC from Fairfield, CT</div>
<div ><strong>Owns this car</strong></div>
</div>
<div >
<p >I love that BMW makes a mid size suv that is part electric now. We purchased the x3 edrive. I do mostly local driving during the week so only need to fill up with gas once a month to every six weeks. Excellent car!</p>
</div>
</div>
<div >
<h3 >Best Purchase for the Value and Cost</h3>
<div >
<div>June 7, 2021</div>
<div>By Brandon from Peachtree City from Peachtree City, GA</div>
<div ><strong>Owns this car</strong></div>
</div>
<div >
<p >The BMW X3 is not a crossover that should be ignored. It is more than I expected coming from someone who has owned a 3 Series BMW for the last 6 years. Now I ask myself, why didn't I opt for the X3 much sooner, especially since it is more spacious with better features than my 3 series. Plus the price was only about $3,000 more for plenty more space and amenities. You owe it to yourself to test drive one soon!</p>
</div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(sample)
data = []
for e in soup.select('div.consumer-review-container'):
data.append({
'review-title':e.h3.text,
'review-type':e.select_one('div.review-type').text,
'review-section':e.select_one('p.review-body').text
})
print(data)
Output
[{'review-title': 'Excellent Car',
'review-type': 'Owns this car',
'review-section': 'I love that BMW makes a mid size suv that is part electric now. We purchased the x3 edrive. I do mostly local driving during the week so only need to fill up with gas once a month to every six weeks. Excellent car!'},
{'review-title': 'Best Purchase for the Value and Cost',
'review-type': 'Owns this car',
'review-section': "The BMW X3 is not a crossover that should be ignored. It is more than I expected coming from someone who has owned a 3 Series BMW for the last 6 years. Now I ask myself, why didn't I opt for the X3 much sooner, especially since it is more spacious with better features than my 3 series. Plus the price was only about $3,000 more for plenty more space and amenities. You owe it to yourself to test drive one soon!"}]