Home > Enterprise >  Scraping info of product but not getting desired output
Scraping info of product but not getting desired output

Time:06-28

I'm attempting to scrape a web page for products on a search result, and firstly attempting to only scrape the title of each product: https://aiswatches.com/search.php?search_query_adv=16570&section=product

I know I'm looking at the correct section of the html code in developer tools, however I believe my python code is missing something to do with the <a tag and I can't seem to get the syntax to add this: html code

Here is my python code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://aiswatches.com/search.php?search_query_adv=16570&section=product'

def get_data(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    return soup

def parse(soup):
    results = soup.find_all('div', {'class': 'prodBox'})
    for item in results:
        products = {
            'title': item.find('div', {'class': 'prodTitle'}).text
            #'price':
            #'link':
        }
    return

soup = get_data(url)
parse(soup)

Any help would be extremely appreciated. I got stuck after following a video tutorial.

CodePudding user response:

I think your problem is that you couldn't get a link to the product. link = watch.find('a').get('href')

url = 'https://aiswatches.com/search.php?search_query_adv=16570&section=product'
response = requests.get(url)
for watch in BeautifulSoup(response.text, 'html.parser').find_all('div', class_='prodTitle'):
    title = watch.getText().strip()
    link = watch.find('a').get('href')
    price = watch.parent.findNext('div', class_='prodPrice').getText().strip()
    print(title, price, link)

OUTPUT:

Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers/
Rolex Polar Explorer II 16570 M Serial White Dial 3186 Movement Box & Papers $12,575.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-m-serial-white-dial-3186-movement-box-papers/
Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers   RSC Serviced $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers-rsc-serviced-1/
Rolex Polar Explorer II 16570 Y Serial White Dial Box & Paper $10,975.00 https://aiswatches.com/rolex-polar-explorer-ii-16570-y-serial-white-dial-box-paper/
Rolex Polar Explorer II 216570 White Dial 42mm Random Serial B&P $11,975.00 https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-b-p-1/
Rolex Polar Explorer II 216570 White Dial 42mm Random Serial Box & Paper $12,975.00 https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-box-paper/
Rolex Explorer II 16570 M Serial Black Dial 3186 Movement $9,975.00 https://aiswatches.com/rolex-explorer-ii-16570-m-serial-black-dial-3186-movemen/

CodePudding user response:

The desired output using CSS Selector:

    import pandas as pd
    import requests
    from bs4 import BeautifulSoup
    url='https://aiswatches.com/search.php?search_query_adv=16570&section=product'
    req = requests.get(url)
    soup = BeautifulSoup(req.text, 'lxml')
    data=[]
    for card in soup.select('.productGrid li'):
        title = card.select_one('div.prodTitle > a').get_text(strip=True)
        price= card.select_one('div.wirePrice span').get_text(strip=True)
        link= card.select_one('div.prodTitle a').get('href')
        
        data.append({
            'title':title,
            'price':price,
            'link':link
            })
    
print(data)
# df = pd.DataFrame(data)
# print(df)

Output:

[{'title': 'Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers/'}, {'title': 'Rolex Polar Explorer II 16570 M Serial White Dial 3186 Movement Box & Papers', 'price': '$12,575.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-m-serial-white-dial-3186-movement-box-papers/'}, {'title': 'Rolex Polar Explorer II 16570 P Serial White Dial Box & Papers   RSC Serviced', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-p-serial-white-dial-box-papers-rsc-serviced-1/'}, {'title': 'Rolex Polar Explorer II 16570 Y Serial White Dial Box & Paper', 'price': '$10,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-16570-y-serial-white-dial-box-paper/'}, {'title': 'Rolex Polar Explorer II 216570 White Dial 42mm Random Serial B&P', 'price': '$11,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-b-p-1/'}, {'title': 'Rolex Polar Explorer II 216570 White Dial 42mm Random Serial Box & Paper', 'price': '$12,975.00', 'link': 'https://aiswatches.com/rolex-polar-explorer-ii-216570-white-dial-42mm-random-serial-box-paper/'}, {'title': 'Rolex Explorer II 16570 M Serial Black Dial 3186 Movement', 'price': '$9,975.00', 'link': 'https://aiswatches.com/rolex-explorer-ii-16570-m-serial-black-dial-3186-movemen/'}]
  • Related