My code:
import time
import requests
import pandas as pd
from bs4 import BeautifulSoup
URL = "https://www.hkex.com.hk/eng/stat/dmstat/dayrpt/hsitmc220303.htm"
req = requests.get(URL)
page = BeautifulSoup(req.content, 'html.parser')
table = page.find_all('pre')
df = pd.read_html(str(table), displayed_only=False)[0]
print(df)
Error message:
ValueError: No tables found
I want to get the table to dataframe. Any suggestions?
CodePudding user response:
This should work :
import requests
import pandas as pd
url = 'https://www.hkex.com.hk/eng/stat/dmstat/dayrpt/hsitmc220303.htm'
payload = {
'LangCode': 'en',
'TDD': '1',
'TMM': '11',
'TYYYY': '2019'}
jsonData = requests.get(url, params=payload).json()
final_df = pd.DataFrame()
for row in jsonData['data']:
#row = jsonData['data'][1]
data_row = []
for idx, colspan in enumerate(row['colspan']):
colspan_int = int(colspan[0])
data_row.append(row['td'][idx] * colspan_int)
flat_list = [item for sublist in data_row for item in sublist]
temp_row = pd.DataFrame([flat_list])
final_df = final_df.append(temp_row, sort=True).reset_index(drop=True)
df = final_df[final_df[0].str.contains(r'Total market
capitalisation(?!$)')].iloc[:,:2]
df['date'] = date
df.to_csv('file.csv', index=False)
CodePudding user response:
I saw the page's source and I think I know the issue on your code.
Your code doesn't work because in C-based languages, pre
isn't equal to PRE
just because of the caps.