I'm attempting to scrape a web page. When executing this code, it outputs running1
but not running2
. Why would this be the case?
Code:
from time import gmtime, strftime
import requests
from bs4 import BeautifulSoup
import smtplib
from email.mime.text import MIMEText
print("running1")
url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url)
print("running2")
soup = BeautifulSoup(response.text, 'lxml')
print("running3")
CodePudding user response:
To get correct response from server try to specify User-Agent
HTTP header:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0"
}
url = "https://www.johnlewis.com/nordictrack-commercial-14-9-elliptical-cross-trainer/p5639979"
response = requests.get(url, headers=headers)
print(response.text)
Prints:
<!DOCTYPE html><html lang="en"><head>
...