Home > database >  Why am I only getting an empty file when I try to scrape an XML file from a site with Requests in Py
Why am I only getting an empty file when I try to scrape an XML file from a site with Requests in Py

Time:09-11

I'm trying to use Python to download XML files from this site:

https://media.waec.wa.gov.au/

But the following examples are both leaving me with just an empty XML file. The first saves me an "InsecureRequestWarning" message but the outcome of both is the same.

r = requests.get('https://media.waec.wa.gov.au/2022 North West Central By-Election - LA VERBOSE RESULTS.xml', verify='~ file path for locally saved site certificate PEM file ~')
r.raw.decode_content = True
with open('~ file path for saved file ~', 'wb') as f:
        shutil.copyfileobj(r.raw, f)
r = requests.get('https://media.waec.wa.gov.au/2022 North West Central By-Election - LA VERBOSE RESULTS.xml', verify=False)
r.raw.decode_content = True
with open('~ file path for saved file ~', 'wb') as f:
        shutil.copyfileobj(r.raw, f)

CodePudding user response:

You receive an empty file, because you didn't receive a response. When I tried your snippet I received http 403 status code. This happened because this site didn’t accept a request without headers

Below you can find code, which makes me able to save the result to the xml file.

import requests

headers = {'User-Agent': 'Python User Agent'}
url = 'http://media.waec.wa.gov.au/2022 North West Central By-Election - LA VERBOSE RESULTS.xml'
res = requests.get(url, headers=headers)

with open('my_file.xml', 'w') as file:
    file.write(res.text)
  • Related