Python cannot get table from stock exchange report-CodePudding

My code:

import time    
import requests
import pandas as pd
from bs4 import BeautifulSoup

URL = "https://www.hkex.com.hk/eng/stat/dmstat/dayrpt/hsitmc220303.htm"

req = requests.get(URL)    
page = BeautifulSoup(req.content, 'html.parser')    
table = page.find_all('pre')    
df = pd.read_html(str(table), displayed_only=False)[0]    
print(df)

Error message:

ValueError: No tables found

I want to get the table to dataframe. Any suggestions?

CodePudding user response：

This should work :

import requests
import pandas as pd

url = 'https://www.hkex.com.hk/eng/stat/dmstat/dayrpt/hsitmc220303.htm'

payload = {
'LangCode': 'en',
'TDD': '1',
'TMM': '11',
'TYYYY': '2019'}

jsonData = requests.get(url, params=payload).json()

final_df = pd.DataFrame()
for row in jsonData['data']:
    #row = jsonData['data'][1]

    data_row = []
    for idx, colspan in enumerate(row['colspan']):
        colspan_int = int(colspan[0])
        data_row.append(row['td'][idx] * colspan_int)
        flat_list = [item for sublist in data_row for item in sublist]
    temp_row = pd.DataFrame([flat_list])
    final_df = final_df.append(temp_row, sort=True).reset_index(drop=True)


df = final_df[final_df[0].str.contains(r'Total market 
capitalisation(?!$)')].iloc[:,:2]
df['date'] = date
df.to_csv('file.csv', index=False)

CodePudding user response：

I saw the page's source and I think I know the issue on your code.

Your code doesn't work because in C-based languages, pre isn't equal to PRE just because of the caps.