Home > Blockchain >  Why python request return html file instead of excel?
Why python request return html file instead of excel?

Time:03-12

I want to download excel file from this link via python https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&locale=en_US&periodView=A

Here is my code:

url = 'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

resp = requests.get(url)
with open('file.xls','wb') as f:
    f.write(resp.content)

But the file.xls is instead a html text file. file.xls looks like this.1

I've tried add headers

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
resp = requests.get(url, headers=headers)

But it didn't help. Thank you in advance.

CodePudding user response:

You can use urllib, from this post

URL = "https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&locale=en_US&periodView=A"
import urllib.request
urllib.request.urlretrieve(URL, "file.xlsx")

CodePudding user response:

Edit: Found a way using pandas.

import pandas as pd

url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

# read into HTML tables
tables = pd.read_html(url)
# merge HTML tables
merged = pd.concat(tables)
# Write tables to excel file
merged.to_excel("output.xlsx")

Hope this helps :)

Ignore below, this was before edit:

I know this is still problematic depending on your downstream application. The code below does seem to still download it into a HTML format, but this format can be opened in excel regardless.

import requests
url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'

r = requests.get(url, allow_redirects=False)
excel_url = r.url
open('out.xls', 'wb').write(r.content)

When I open this in excel I get a warning, and click okay.

screenshot of file

  • Related