I am web scraping for the first time, and ran into a problem: some classes have the same name.
This is the code:
testlink = 'https://www.ah.nl/producten/product/wi387906/wasa-volkoren'
r = requests.get(testlink)
soup = BeautifulSoup(r.content, 'html.parser')
products = (soup.findAll('dd', class_='product-info-definition-list_value__kspp6'))
And this is the output
[<dd >13 g</dd>, <dd >20</dd>, <dd >Rogge, Glutenbevattende Granen</dd>, <dd >Sesamzaad, Melk</dd>]
I need to get the 3rd class (Rogge, Glutenbevattende Granen)... I am using this link to test, and eventually want to scrape multiple pages of the website. Anyone any tips?
Thank you!
CodePudding user response:
You can select all of dd tags with class value product-info-definition-list_value__kspp6
and list slicing
import requests
from bs4 import BeautifulSoup
url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,11):
req = requests.get(url.format(page=page))
soup = BeautifulSoup(req.content, 'html.parser')
for link in soup.select('div[] a'):
abs_url = 'https://www.ah.nl' link.get('href')
#print(abs_url)
req2 = requests.get(abs_url)
soup2 = BeautifulSoup(req2.content, 'html.parser')
dd = [d.get_text() for d in soup2.select('dd[]')][2:-2]
print(dd)