I am trying to return a specific table from this webpage.
I have tried to scrape it using beautifulsoup but that got way too complicated so I an trying to use pd.read_html instead. The table I'm after is the one with "XYZ Substantial Shareholders" (note that this is just one particular page, I would be using this code to return tables from other stocks as well).
This is the code that I'm currently using to search for the tables - my initial idea was to find the word 'Holding' in the header of the title I'm after, but it has also picked up another table which I didn't want because the word 'Holdings' appeared in an earlier table.
# Scrape the Substantial Shareholder table if it exists
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table)
Current output
Company Mkt Cap 2018 A 2019 A 2020 A 2018 A 2019 A 2020 A 2018 A 2019 A 2020 A
0 Altium (ALU) $4,528M 0.1377 0.3989 0.2353 96.9595 69.3097 56.1095 1.55% 1.28% 1.46%
1 Brainchip Holdings (BRN) $2,974M 0.6838 0.0000 0.0000 0.0000 0.0000 0.0000 0.00% 0.00% 0.00%
2 Technology One (TNE) $3,299M 0.1484 0.1679 0.1574 45.3375 38.8213 33.5414 1.36% 1.59% 1.84%
3 Wisetech Global (WTC) $14,241M 1.2490 0.3624 0.2610 124.5434 91.4118 72.4917 0.15% 0.20% 0.30%
4 Xero (XRO) $16,610M 2.6697 -0.8109 18.9607 920.3138 4866.8122 243.8197 0.00% 0.00% 0.00%, Holding Name
0 176,305,508 (10.82%) Peter Van Der Made
1 119,205,377 (7.31%) Merrill Lynch (Australia) Nominees Pty Ltd
2 114,093,915 (7%) HSBC Custody Nominees (Australia) Limited]
Process finished with exit code 0
Desired output
Holding Name
0 176,305,508 (10.82%) Peter Van Der Made
1 119,205,377 (7.31%) Merrill Lynch (Australia) Nominees Pty Ltd
2 114,093,915 (7%) HSBC Custody Nominees (Australia) Limited]
Is there a way for me to specifically match only looking at headers rather than the entire page content?
CodePudding user response:
You simply need to do this:
current_url = 'https://www.intelligentinvestor.com.au/shares/asx-brn/brainchip-holdings-ltd'
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table[1])
Which returns:
Holding Name
0 176,305,508 (10.82%) Peter Van Der Made
1 119,205,377 (7.31%) Merrill Lynch (Australia) Nominees Pty Ltd
2 114,093,915 (7%) HSBC Custody Nominees (Australia) Limited
while
current_url = 'https://www.intelligentinvestor.com.au/shares/asx-brn/brainchip-holdings-ltd'
sub_table = pd.read_html(current_url, match='Holding')
print(sub_table[0])
gives
Company Mkt Cap EPS Growth (%) \
Company Mkt Cap 2018 A 2019 A 2020 A
0 Altium (ALU) $4,536M 0.1377 0.4001 0.2353
1 Brainchip Holdings (BRN) $2,794M 0.6838 0.0000 0.0000
2 Technology One (TNE) $3,296M 0.1484 0.1679 0.1574
3 Wisetech Global (WTC) $14,456M 1.2490 0.3624 0.2610
4 Xero (XRO) $16,684M 2.6697 -0.8101 18.8826
P/E (%) Dividend Yield (%)
2018 A 2019 A 2020 A 2018 A 2019 A 2020 A
0 97.1284 69.3746 56.1615 1.54% 1.28% 1.46%
1 0.0000 0.0000 0.0000 0.00% 0.00% 0.00%
2 45.2931 38.7833 33.5085 1.36% 1.60% 1.84%
3 126.4269 92.7943 73.5880 0.15% 0.20% 0.29%
4 924.4426 4867.3913 244.8065 0.00% 0.00% 0.00%