I want to download excel file from this link via python https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&locale=en_US&periodView=A
Here is my code:
url = 'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'
resp = requests.get(url)
with open('file.xls','wb') as f:
f.write(resp.content)
But the file.xls is instead a html text file. file.xls looks like this.1
I've tried add headers
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
resp = requests.get(url, headers=headers)
But it didn't help. Thank you in advance.
CodePudding user response:
You can use urllib, from this post
URL = "https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&locale=en_US&periodView=A"
import urllib.request
urllib.request.urlretrieve(URL, "file.xlsx")
CodePudding user response:
Edit: Found a way using pandas.
import pandas as pd
url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'
# read into HTML tables
tables = pd.read_html(url)
# merge HTML tables
merged = pd.concat(tables)
# Write tables to excel file
merged.to_excel("output.xlsx")
Hope this helps :)
Ignore below, this was before edit:
I know this is still problematic depending on your downstream application. The code below does seem to still download it into a HTML format, but this format can be opened in excel regardless.
import requests
url = r'https://www.tfex.co.th/tfex/historicalTrading.html?locale=en_US&symbol=S50Z21&decorator=excel&series=&page=4&locale=en_US&periodView=A'
r = requests.get(url, allow_redirects=False)
excel_url = r.url
open('out.xls', 'wb').write(r.content)
When I open this in excel I get a warning, and click okay.