Home > Mobile >  What's wrong with this get method call using BeautifulSoup?
What's wrong with this get method call using BeautifulSoup?

Time:06-22

I'm attempting to scrape a web page. When executing this code, it outputs running1 but not running2. Why would this be the case?

Code:

from time import gmtime, strftime

import requests
from bs4 import BeautifulSoup

import smtplib
from email.mime.text import MIMEText

print("running1")

url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url)

print("running2")

soup = BeautifulSoup(response.text, 'lxml')

print("running3")

CodePudding user response:

To get correct response from server try to specify User-Agent HTTP header:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0"
}

url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url, headers=headers)

print(response.text)

Prints:

<!DOCTYPE html><html lang="en"><head>

...
  • Related