Home > other >  Unable to scrape different plumber names from a static webpage using requests module
Unable to scrape different plumber names from a static webpage using requests module

Time:08-06

I've been trying to scrape different plumber names from this webpage for the last couple of hours using requests module as the content of that site is static and is also available in page source (Ctrl U).

However, when I run the script, I get the following error:

raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

This is how I'm trying:

from bs4 import BeautifulSoup
import requests
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

link = 'https://www.yellowpages.co.za/search?what=plumber&where=bryanston west, sandton, gauteng&pg=2'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'Referer': 'https://www.yellowpages.co.za/search?what=plumber&where=bryanston west, sandton, gauteng&pg=1',
    'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9',
    'Host': 'www.yellowpages.co.za',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link,verify=False)
    soup = BeautifulSoup(res.text,"html.parser")
    for shop_name in soup.select("h5.nameOverflow"):
        print(shop_name.get_text(strip=True))

CodePudding user response:

I tried to make it work with requests but at the end I've used standard python's urlopen():

import ssl
from bs4 import BeautifulSoup
from urllib.request import urlopen


ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

response = urlopen(
    "https://www.yellowpages.co.za/search?what=plumber&where=bryanston west, sandton, gauteng&pg=2",
    context=ctx,
)

soup = BeautifulSoup(response, "html.parser")
for shop_name in soup.select("h5.nameOverflow"):
    print(shop_name.get_text(strip=True))

Prints:

Absolutely Fast Plumbing Co CC
Property Matters Gauteng (Pty) Ltd
Plumlite
Geyser Man
Daryn's Plumbing Services (Pty) Ltd
Electroc
Outek Engineers CC
Bryanston Plumbing (Pty) Ltd
Mage Plumbing & Electrical
Fourways Plumbing
Matrix Plumber
Angel Plumbers
Renovations And Maintenance Services
DCB Supplies
Call Us Plumbing
A B A Group
Action Plumbing
Clearline Plumbing Services
Capital Plumbing Supplies
AGD Plumbing

CodePudding user response:

the reason why you're getting this error is because when you're accessing the website via the link provided it is being redirected quiet a few times. Take a look at this thread, and let me know if that works for you

  • Related