I'm getting data from using print command but in Pandas DataFrame throwing result as : Empty DataFrame,Columns: [],Index: [`]
Script:
from bs4 import BeautifulSoup
import requests
import re
import json
import pandas as pd
url='http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1640132253903&t=XNAS:AAPL'
req=requests.get(url).text
#print(req)
data=re.search(r'jsonp1640132253903\((\{.*\})\)',req).group(1)
json_data=json.loads(data)['componentData']
#print(json_data)
# with open('index.html','w') as f:
# f.write(json_data)
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
if len(row_data) < 12:
row_data = ['Particulars'] row_data
#print(row_data)
df=pd.DataFrame(row_data)
print(df)
Print result:
['Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM']
['RevenueUSD Mil', '156,508', '170,910', '182,795', '233,715', '215,639', '229,234', '265,595', '260,174', '274,515', '365,817', '365,817']
['Gross Margin %', '43.9', '37.6', '38.6', '40.1', '39.1', '38.5', '38.3', '37.8', '38.2', '41.8', '41.8']
['Operating IncomeUSD Mil', '55,241', '48,999', '52,503', '71,230', '60,024', '61,344', '70,898', '63,930', '66,288', '108,949', '108,949']
['Operating Margin %', '35.3', '28.7', '28.7', '30.5', '27.8', '26.8', '26.7', '24.6', '24.1', '29.8', '29.8']
['Net IncomeUSD Mil', '41,733', '37,037', '39,510', '53,394', '45,687', '48,351', '59,531', '55,256', '57,411',
'94,680', '94,680']
['Earnings Per ShareUSD', '1.58', '1.42', '1.61', '2.31', '2.08', '2.30', '2.98', '2.97', '3.28', '5.61', '5.61'
Expected output:
2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
Revenue USD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
Operating Income USD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
Net Income USD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
Earnings Per Share USD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
Dividends USD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
Shares Mil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
Book Value Per Share * USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
Operating Cash Flow USD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
Cap Spending USD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
Free Cash Flow USD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
Free Cash Flow Per Share * USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
Working Capital USD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355
Expected columns:
'Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM'
CodePudding user response:
Use read_html for the DataFrame creation and then drop the na rows
json_data=json.loads(data)['componentData']
pd.read_html(json_data)[0].dropna(axis=0, how='all')
CodePudding user response:
@QHarr's answer is by far the most straightforward (and you should probably accept it), but in case you are wondering what is wrong with your code, it's that you are resetting the variable row_data
for every iteration of the loop.
To make your code work, you can instead store each row as an element in a list. Then to build a DataFrame, you can pass this list of rows and the column names to pd.DataFrame
:
data = []
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
if not row_data:
continue
elif len(row_data) < 12:
columns = ['Particulars'] row_data
else:
data.append(row_data)
df=pd.DataFrame(data, columns=columns)
Result:
>>> df
Particulars 2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM
0 RevenueUSD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
1 Gross Margin % 43.9 37.6 38.6 40.1 39.1 38.5 38.3 37.8 38.2 41.8 41.8
2 Operating IncomeUSD Mil 55,241 48,999 52,503 71,230 60,024 61,344 70,898 63,930 66,288 108,949 108,949
3 Operating Margin % 35.3 28.7 28.7 30.5 27.8 26.8 26.7 24.6 24.1 29.8 29.8
4 Net IncomeUSD Mil 41,733 37,037 39,510 53,394 45,687 48,351 59,531 55,256 57,411 94,680 94,680
5 Earnings Per ShareUSD 1.58 1.42 1.61 2.31 2.08 2.30 2.98 2.97 3.28 5.61 5.61
6 DividendsUSD 0.09 0.41 0.45 0.49 0.55 0.60 0.68 0.75 0.80 0.85 0.85
7 Payout Ratio % * — 27.4 28.5 22.3 24.8 26.5 23.7 25.1 23.7 16.3 15.2
8 SharesMil 26,470 26,087 24,491 23,172 22,001 21,007 20,000 18,596 17,528 16,865 16,865
9 Book Value Per Share *USD 4.25 4.90 5.15 5.63 5.93 6.46 6.04 5.43 4.26 3.91 3.85
10 Operating Cash FlowUSD Mil 50,856 53,666 59,713 81,266 65,824 63,598 77,434 69,391 80,674 104,038 104,038
11 Cap SpendingUSD Mil -9,402 -9,076 -9,813 -11,488 -13,548 -12,795 -13,313 -10,495 -7,309 -11,085 -11,085
12 Free Cash FlowUSD Mil 41,454 44,590 49,900 69,778 52,276 50,803 64,121 58,896 73,365 92,953 92,953
13 Free Cash Flow Per Share *USD 1.58 1.61 1.93 2.96 2.24 2.41 2.88 3.07 4.04 5.57 —
14 Working CapitalUSD Mil 19,111 29,628 5,083 8,768 27,863 27,831 14,473 57,101 38,321 9,355 —