I'm attempting to scrape a web page for products on a search result, and firstly attempting to only scrape the title of each product: https://aiswatches.com/search.php?search_query_adv=16570§ion=product
I know I'm looking at the correct section of the html code in developer tools, however I believe my python code is missing something to do with the <a tag and I can't seem to get the syntax to add this: html code
Here is my python code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://aiswatches.com/search.php?search_query_adv=16570§ion=product'
def get_data(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
return soup
def parse(soup):
results = soup.find_all('div', {'class': 'prodBox'})
for item in results:
products = {
'title': item.find('div', {'class': 'prodTitle'}).text
#'price':
#'link':
}
return
soup = get_data(url)
parse(soup)
Any help would be extremely appreciated. I got stuck after following a video tutorial.
CodePudding user response:
I think your problem is that you couldn't get a link to the product. link = watch.find('a').get('href')
url = 'https://aiswatches.com/search.php?search_query_adv=16570§ion=product'
response = requests.get(url)
for watch in BeautifulSoup(response.text, 'html.parser').find_all('div', class_='prodTitle'):
title = watch.getText().strip()
link = watch.find('a').get('href')
price = watch.parent.findNext('div', class_='prodPrice').getText().strip()
print(title, price, link)
OUTPUT:
Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers/
Rolex Polar Explorer II 16570 M Serial White Dial 3186 Movement Box & Papers $12,575.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-m-serial-white-dial-3186-movement-box-papers/
Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers RSC Serviced $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers-rsc-serviced-1/
Rolex Polar Explorer II 16570 Y Serial White Dial Box & Paper $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-y-serial-white-dial-box-paper/
Rolex Polar Explorer II 216570 White Dial 42mm Random Serial B&P $11,975.00 https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-b-p-1/
Rolex Polar Explorer II 216570 White Dial 42mm Random Serial Box & Paper $12,975.00 https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-box-paper/
Rolex Explorer II 16570 M Serial Black Dial 3186 Movement $9,975.00 https://aiswatches.com/rolex-explorer-ii-16570-m-serial-black-dial-3186-movemen/
CodePudding user response:
The desired output using CSS Selector:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url='https://aiswatches.com/search.php?search_query_adv=16570§ion=product'
req = requests.get(url)
soup = BeautifulSoup(req.text, 'lxml')
data=[]
for card in soup.select('.productGrid li'):
title = card.select_one('div.prodTitle > a').get_text(strip=True)
price= card.select_one('div.wirePrice span').get_text(strip=True)
link= card.select_one('div.prodTitle a').get('href')
data.append({
'title':title,
'price':price,
'link':link
})
print(data)
# df = pd.DataFrame(data)
# print(df)
Output:
[{'title': 'Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers/'}, {'title': 'Rolex Polar Explorer II 16570 M Serial White Dial 3186 Movement Box & Papers', 'price': '$12,575.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-m-serial-white-dial-3186-movement-box-papers/'}, {'title': 'Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers RSC Serviced', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers-rsc-serviced-1/'}, {'title': 'Rolex Polar Explorer II 16570 Y Serial White Dial Box & Paper', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-y-serial-white-dial-box-paper/'}, {'title': 'Rolex Polar Explorer II 216570 White Dial 42mm Random Serial B&P', 'price': '$11,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-b-p-1/'}, {'title': 'Rolex Polar Explorer II 216570 White Dial 42mm Random Serial Box & Paper', 'price': '$12,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-box-paper/'}, {'title': 'Rolex Explorer II 16570 M Serial Black Dial 3186 Movement', 'price': '$9,975.00', 'link': 'https://aiswatches.com/rolex-explorer-ii-16570-m-serial-black-dial-3186-movemen/'}]