Home > Mobile >  How do I Scrape the link of the website from these page
How do I Scrape the link of the website from these page

Time:10-09

I am trying to scrape the link from amazon website but they will provide me 2 or 3 links
the link of website is https://www.amazon.com/s?rh=n:1069242&fs=true&ref=lp_1069242_sar

import requests
from bs4 import BeautifulSoup
import pandas as pd
headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://www.amazon.com/s?rh=n:1069242&fs=true&ref=lp_1069242_sar')
soup=BeautifulSoup(r.content, 'html.parser')
for link in soup.find_all('a',href=True):
        print(link['href'])

CodePudding user response:

Here is the working solution:

import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.parse import urljoin
base_url='https://www.amazon.com'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.71 Safari/537.36','session':'141-2320098-4829807'}
r = requests.get('https://www.amazon.com/s?rh=n:1069242&fs=true&ref=lp_1069242_sar', headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
for link in soup.find_all('a',class_="a-link-normal s-underline-text s-underline-link-text a-text-normal",href=True):
    p=link['href']
    l=urljoin(base_url,p)
    print(l)

Output:

https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_browse_office-products_sr_pg1_1?ie=UTF8&adId=A05861132UJ9W79S82Z3&url=/Fiskars-Inch-Student-Scissors-Pack/dp/B08CL355MN/ref=sr_1_1_sspa?dchild=1&qid=1633717907&s=office-products&sr=1-1-spons&psc=1&qualifier=1633717907&id=1565389383398743&widgetName=sp_atf_browse        
https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_browse_office-products_sr_pg1_1?ie=UTF8&adId=A0918144191FAIKGYK3YC&url=/Fiskars-Inch-Blunt-Kids-Scissors/dp/B00TJSS9ZW/ref=sr_1_2_sspa?dchild=1&qid=1633717907&s=office-products&sr=1-2-spons&psc=1&qualifier=1633717907&id=1565389383398743&widgetName=sp_atf_browse
https://www.amazon.com/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_browse_office-products_sr_pg1_1?ie=UTF8&adId=A09889161KB2CNO5NB8QC&url=/Lind-Kitchen-Dispenser-Decorative-Stationery/dp/B07VRLW5C6/ref=sr_1_3_sspa?dchild=1&qid=1633717907&s=office-products&sr=1-3-spons&psc=1&qualifier=1633717907&id=1565389383398743&widgetName=sp_atf_browse
https://www.amazon.com/Zebra-Pen-Retractable-Ballpoint-18-Count/dp/B00M382RJO/ref=sr_1_4?dchild=1&qid=1633717907&s=office-products&sr=1-4

... so on

  • Related