I'm wondering how to scrap information off a website where there is multiple elements that have the same identifiers from which I want to scrap price data from. The issue I'm having is that when I loop through each div and print() I see its pasted multiple times in the console. I assume this is du to the div I'm locating encapsulated multiple elements with the same tag classname.
GraphicPrice = soup.findAll('div', class_='col')
for price in GraphicPrice:
prices = price.find('span', class_='price__amount')
if prices is None:
pass
else:
print(prices.text)
Output:
£859.99
£859.99
£1,049.99
£1,049.99
£829.99
£829.99
£899.99
£899.99
£999.95
£999.95
£999.95
£999.95
What I want is to eliminate the duplicate information and understand how to refactor my code to prevent this from happening.
Any help would be appreciated. (I'm still leaning) :)
CodePudding user response:
first, you scrap all div tags with "col" class
soup.findAll('div', class_='col')
this div tag has one span class and that one class has two other nested span tags.
so, if you code like this
price.find('span', class_='price__amount')
it scraps two span tags with "price_amount" on each div tag. that represents a wrong class.
if you want a second span tag then your code is like this.
soup.findAll('span', class_='price--sale--colored').find('span', class_='price__amount')
CodePudding user response:
Use .select('span')
to get object from inner of the selected object like
GraphicPrice = soup.findAll('div', class_='col')
for price in GraphicPrice:
prices = price.select('span')