I am new here and have had a read through much of the historic posts but cannot exactly find what I am looking for.
I am new to webscraping and have successfully scraped data from a handful of sites.
However I am having an issue with this code as I am trying to extract the titles of the products using beautiful soup but have an issue somewhere in the code as it is not returning the data? Any help would be appreciated:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://groceries.asda.com/aisle/beer-wine-spirits/spirits/whisky/1215685911554-1215685911575-1215685911576')
sp = BeautifulSoup(webpage.content, 'html.parser')
title = sp.find_all('h3', class_='co-product__title')
print(title)
I assume my issue lies somewhere in the find_all function, however cannot quite work out how to resolve?
Regards Milan
CodePudding user response:
You could try to use this link, it seems to pull the information you desire:
from bs4 import BeautifulSoup
import requests
webpage = requests.get("https://groceries.asda.com/api/items/iconmetadata?request_origin=gi")
sp = BeautifulSoup(webpage.content, "html.parser")
print(sp)
Let me know if this helps.
Thanks,
-VikingOfValhalla
CodePudding user response:
Try this:
from bs4 import BeautifulSoup
import requests
import pandas as pd
webpage = requests.get('https://groceries.asda.com/aisle/beer-wine-spirits/spirits/whisky/1215685911554-1215685911575-1215685911576')
sp = BeautifulSoup(webpage.content, 'html.parser')
title = sp.find_all('h3', {'class':'co-product__title'})
print(title[0])
also i prefer
sp = BeautifulSoup(webpage.text, 'lxml')
Also note that this will return a list with all elements of that class. If you want just the first instance, use .find ie:
title = sp.find('h3', {'class':'co-product__title'})
Sorry to rain on this parade, but you wont be able to scrape this data with out a webdriver or You can call the api directly. You should research how to get post rendered js in python.