how to get product dimensions from amazon page python web scraping-CodePudding

i was wondering how i can get the product dimensions and weight from an amazon page. this is the page: https://www.amazon.com/Vic-Firth-American-5B-Drumsticks/dp/B0002F73Z8/ref=psdc_11966431_t1_B0064RNNP2?th=1 there is a place where it says

item weight: 3.2 ounces

product dimensions: 16 x 0.6 x 0.6.

I am new to webscraping and python so if you could please help me, that would be awesome!

CodePudding user response：

What you need to do is first install selenium in your computer. Then after setting it up you can go to the page and click on inspect.Then you search what you need to scrap after that copy the XPath . You can Follow Selenium with Python Docs or watch any tutorial for more detail .

CodePudding user response：

I would advise you look into requests, Beautiful Soup and Selenium. These are useful libraries/tools for web scraping. Also I believe Amazon specifically blocks a lot of scraping requests so you will need to mimic a regular users browser for it to work.

CodePudding user response：

This will get what you want done when it comes to scraping a webpage for specific content in the context of your question. From this point you'd want to loop over a list of desired URLs. Hope this helps :)

#!/bin/python3

from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
        'cookie': '<YOUR_COOKIE>'}
req = requests.get("https://www.amazon.com/dp/B0002F73Z8?th=1", headers=headers)
soup = BeautifulSoup(req.content.decode("utf-8"), 'html.parser')

for res in soup.find_all('td'):
    try:
        if "prodDetAttrValue" in res['class']:
            print(res.get_text())
    except KeyError:
        pass

output:

3.2 ounces 
 16 x 0.6 x 0.6 inches 
 USA 
 B0002F73Z8 
 5B 
 No 
 April 13, 2004 
 Natural 
 Hickory 
 5b