Home > Mobile >  Python program times-out when hitting this website
Python program times-out when hitting this website

Time:12-21

Why does this function fail to read XML from "https://www.seattletimes.com/feed/"?

I can visit the URL from my browser just fine. It also reads XML from other websites without a problem ("https://news.ycombinator.com/rss").

import urllib


def get_url(u):
    header = {'User-Agent': 'Mozilla/5.0'}
    request = urllib.request.Request(url=url, headers=header)
    response = urllib.request.urlopen(request)
    return response.read().decode('utf-8')

url = 'https://www.seattletimes.com/feed/'

feed = get_url(url)

print(feed)

The program times out every time.

Ideas?:

  • Maybe header need more info (Accept, etc.)?

EDIT1:

I replaced with the request header from the script with my browser header. Still no-go.

header = {
    'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    'Accept-Language': 'en-US,en;q=0.9',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36' }

CodePudding user response:

I am not quite sure why but the header/user-agent was confusing the website. If you remove it your code works just fine. I've tried different header arguments without issues, the user-agent seems to be what causes that behaviour.

import urllib.request


def get_url(u):
    request = urllib.request.Request(url=url)
    response = urllib.request.urlopen(request)
    return response.read().decode('utf-8')

url = 'https://www.seattletimes.com/feed/'

feed = get_url(url)

print(feed)

After some debugging I have found a legal header combination (keep in mind I consider this a bug on their end):

  header = {
        'User-Agent': 'Mozilla/5.0',
        'Cookie': 'PHPSESSID=kfdkdofsdj99g36l443862qeq2',
        'Accept-Language': "de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7",}
  • Related