i was wondering how i can get the product dimensions and weight from an amazon page. this is the page: https://www.amazon.com/Vic-Firth-American-5B-Drumsticks/dp/B0002F73Z8/ref=psdc_11966431_t1_B0064RNNP2?th=1 there is a place where it says
item weight: 3.2 ounces
product dimensions: 16 x 0.6 x 0.6.
I am new to webscraping and python so if you could please help me, that would be awesome!
CodePudding user response:
What you need to do is first install selenium in your computer. Then after setting it up you can go to the page and click on inspect.Then you search what you need to scrap after that copy the XPath . You can Follow Selenium with Python Docs or watch any tutorial for more detail .
CodePudding user response:
I would advise you look into requests, Beautiful Soup and Selenium. These are useful libraries/tools for web scraping. Also I believe Amazon specifically blocks a lot of scraping requests so you will need to mimic a regular users browser for it to work.
CodePudding user response:
This will get what you want done when it comes to scraping a webpage for specific content in the context of your question. From this point you'd want to loop over a list of desired URLs. Hope this helps :)
#!/bin/python3
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'cookie': '<YOUR_COOKIE>'}
req = requests.get("https://www.amazon.com/dp/B0002F73Z8?th=1", headers=headers)
soup = BeautifulSoup(req.content.decode("utf-8"), 'html.parser')
for res in soup.find_all('td'):
try:
if "prodDetAttrValue" in res['class']:
print(res.get_text())
except KeyError:
pass
output:
3.2 ounces
16 x 0.6 x 0.6 inches
USA
B0002F73Z8
5B
No
April 13, 2004
Natural
Hickory
5b