Home > OS >  Getting Empty DataFrame in pandas from table data
Getting Empty DataFrame in pandas from table data

Time:12-22

I'm getting data from using print command but in Pandas DataFrame throwing result as : Empty DataFrame,Columns: [],Index: [`]

Script:

from bs4 import BeautifulSoup
import requests
import re
import json
import pandas as pd

url='http://financials.morningstar.com/finan/financials/getFinancePart.html?&callback=jsonp1640132253903&t=XNAS:AAPL'

req=requests.get(url).text
#print(req)
data=re.search(r'jsonp1640132253903\((\{.*\})\)',req).group(1)
json_data=json.loads(data)['componentData']
#print(json_data)
# with open('index.html','w') as f:
#     f.write(json_data)

soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
    row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
    if not row_data:
         continue

    if len(row_data) < 12:
        row_data = ['Particulars']   row_data
    #print(row_data)
                         
df=pd.DataFrame(row_data)
print(df)

Print result:

['Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM'] ['RevenueUSD Mil', '156,508', '170,910', '182,795', '233,715', '215,639', '229,234', '265,595', '260,174', '274,515', '365,817', '365,817'] ['Gross Margin %', '43.9', '37.6', '38.6', '40.1', '39.1', '38.5', '38.3', '37.8', '38.2', '41.8', '41.8']
['Operating IncomeUSD Mil', '55,241', '48,999', '52,503', '71,230', '60,024', '61,344', '70,898', '63,930', '66,288', '108,949', '108,949'] ['Operating Margin %', '35.3', '28.7', '28.7', '30.5', '27.8', '26.8', '26.7', '24.6', '24.1', '29.8', '29.8']
['Net IncomeUSD Mil', '41,733', '37,037', '39,510', '53,394', '45,687', '48,351', '59,531', '55,256', '57,411', '94,680', '94,680'] ['Earnings Per ShareUSD', '1.58', '1.42', '1.61', '2.31', '2.08', '2.30', '2.98', '2.97', '3.28', '5.61', '5.61'

Expected output:

2012-09 2013-09 2014-09 2015-09 2016-09 2017-09 2018-09 2019-09 2020-09 2021-09 TTM

Revenue USD Mil 156,508 170,910 182,795 233,715 215,639 229,234 265,595 260,174 274,515 365,817 365,817
Gross Margin %  43.9    37.6    38.6    40.1    39.1    38.5    38.3    37.8    38.2    41.8    41.8
Operating Income USD Mil    55,241  48,999  52,503  71,230  60,024  61,344  70,898  63,930  66,288  108,949 108,949
Operating Margin %  35.3    28.7    28.7    30.5    27.8    26.8    26.7    24.6    24.1    29.8    29.8
Net Income USD Mil  41,733  37,037  39,510  53,394  45,687  48,351  59,531  55,256  57,411  94,680  94,680
Earnings Per Share USD  1.58    1.42    1.61    2.31    2.08    2.30    2.98    2.97    3.28    5.61    5.61
Dividends USD   0.09    0.41    0.45    0.49    0.55    0.60    0.68    0.75    0.80    0.85    0.85
Payout Ratio % *    —   27.4    28.5    22.3    24.8    26.5    23.7    25.1    23.7    16.3    15.2
Shares Mil  26,470  26,087  24,491  23,172  22,001  21,007  20,000  18,596  17,528  16,865  16,865
Book Value Per Share * USD  4.25    4.90    5.15    5.63    5.93    6.46    6.04    5.43    4.26    3.91    3.85
Operating Cash Flow USD Mil 50,856  53,666  59,713  81,266  65,824  63,598  77,434  69,391  80,674  104,038 104,038
Cap Spending USD Mil    -9,402  -9,076  -9,813  -11,488 -13,548 -12,795 -13,313 -10,495 -7,309  -11,085 -11,085
Free Cash Flow USD Mil  41,454  44,590  49,900  69,778  52,276  50,803  64,121  58,896  73,365  92,953  92,953
Free Cash Flow Per Share * USD  1.58    1.61    1.93    2.96    2.24    2.41    2.88    3.07    4.04    5.57    —
Working Capital USD Mil 19,111  29,628  5,083   8,768   27,863  27,831  14,473  57,101  38,321  9,355

Expected columns:

'Particulars', '2012-09', '2013-09', '2014-09', '2015-09', '2016-09', '2017-09', '2018-09', '2019-09', '2020-09', '2021-09', 'TTM'  

CodePudding user response:

Use read_html for the DataFrame creation and then drop the na rows

json_data=json.loads(data)['componentData']
pd.read_html(json_data)[0].dropna(axis=0, how='all')

CodePudding user response:

@QHarr's answer is by far the most straightforward (and you should probably accept it), but in case you are wondering what is wrong with your code, it's that you are resetting the variable row_data for every iteration of the loop.

To make your code work, you can instead store each row as an element in a list. Then to build a DataFrame, you can pass this list of rows and the column names to pd.DataFrame:

data = []
soup=BeautifulSoup(json_data,'lxml')
for tr in soup.select('tr'):
    row_data=[td.get_text(strip=True) for td in tr.select('td,th') if td.text]
    if not row_data:
        continue
    elif len(row_data) < 12:
        columns = ['Particulars']   row_data
    else:
        data.append(row_data)
                         
df=pd.DataFrame(data, columns=columns)

Result:

>>> df
                      Particulars  2012-09  2013-09  2014-09  2015-09  2016-09  2017-09  2018-09  2019-09  2020-09  2021-09      TTM
0                  RevenueUSD Mil  156,508  170,910  182,795  233,715  215,639  229,234  265,595  260,174  274,515  365,817  365,817
1                  Gross Margin %     43.9     37.6     38.6     40.1     39.1     38.5     38.3     37.8     38.2     41.8     41.8
2         Operating IncomeUSD Mil   55,241   48,999   52,503   71,230   60,024   61,344   70,898   63,930   66,288  108,949  108,949
3              Operating Margin %     35.3     28.7     28.7     30.5     27.8     26.8     26.7     24.6     24.1     29.8     29.8
4               Net IncomeUSD Mil   41,733   37,037   39,510   53,394   45,687   48,351   59,531   55,256   57,411   94,680   94,680
5           Earnings Per ShareUSD     1.58     1.42     1.61     2.31     2.08     2.30     2.98     2.97     3.28     5.61     5.61
6                    DividendsUSD     0.09     0.41     0.45     0.49     0.55     0.60     0.68     0.75     0.80     0.85     0.85
7                Payout Ratio % *        —     27.4     28.5     22.3     24.8     26.5     23.7     25.1     23.7     16.3     15.2
8                       SharesMil   26,470   26,087   24,491   23,172   22,001   21,007   20,000   18,596   17,528   16,865   16,865
9       Book Value Per Share *USD     4.25     4.90     5.15     5.63     5.93     6.46     6.04     5.43     4.26     3.91     3.85
10     Operating Cash FlowUSD Mil   50,856   53,666   59,713   81,266   65,824   63,598   77,434   69,391   80,674  104,038  104,038
11            Cap SpendingUSD Mil   -9,402   -9,076   -9,813  -11,488  -13,548  -12,795  -13,313  -10,495   -7,309  -11,085  -11,085
12          Free Cash FlowUSD Mil   41,454   44,590   49,900   69,778   52,276   50,803   64,121   58,896   73,365   92,953   92,953
13  Free Cash Flow Per Share *USD     1.58     1.61     1.93     2.96     2.24     2.41     2.88     3.07     4.04     5.57        —
14         Working CapitalUSD Mil   19,111   29,628    5,083    8,768   27,863   27,831   14,473   57,101   38,321    9,355        —
  • Related