Home > Enterprise >  Why am I getting Empty Dataframe message when using concat
Why am I getting Empty Dataframe message when using concat

Time:12-24

I am trying to view the predicted price for MSFT and I am referring to a book titled 'Machine learning and data science blueprints for finance'. It provides a sample code as a case study to determine the future stock price of MSFT. The code is below. However, when I start debugging it, the terminal shows the following message

Empty DataFrame
Columns: [MSFT_pred, GOOGL, DEXJPUS, DEXUSUK, SP500, DJIA, VIXCLS, MSFT_DT, MSFT_3DT, 
MSFT_6DT, MSFT_12DT]

The code is below

from pandas_datareader import data
import yfinance as yf
import pandas_datareader as web
import numpy as np
import pandas as pd

return_period = 5

stk_tickers = ['MSFT','AAPL','GOOGL']

stk_data = yf.download(stk_tickers, start = '2012-01-01', end='2017-01-01')


ccy_tickers = ['DEXJPUS','DEXUSUK']
ccy_data = web.DataReader(ccy_tickers,'fred')


idx_tickers = ['SP500','DJIA','VIXCLS']
idx_data = web.DataReader(idx_tickers,'fred')


Y = np.log(stk_data.loc[:, ('Adj Close', 'MSFT')]).diff(return_period).shift(-                
return_period)
Y.name = Y.name[-1] '_pred'

X1 = np.log(stk_data.loc[:, ('Adj Close', ('GOOGL', 'IBM'))]).diff(return_period)
X1.columns = X1.columns.droplevel()
X2 = np.log(ccy_data).diff(return_period)
X3 = np.log(idx_data).diff(return_period)

X4 = pd.concat([np.log(stk_data.loc[:, ('Adj Close', 'MSFT')]).diff(i) for i in     
[return_period, return_period*3, return_period*6, return_period*12]],     
 axis=1).dropna()
X4.columns = ['MSFT_DT', 'MSFT_3DT', 'MSFT_6DT', 'MSFT_12DT']

X = pd.concat([X1, X2, X3, X4], axis=1)


dataset = pd.concat([Y, X], axis=1).dropna().iloc[::return_period, :]
Y = dataset.loc[:, Y.name]
X = dataset.loc[:, X.columns]
print(dataset)

I know the data is present because when I print 'X' or 'Y' it shows up with it. It must be an issue with concat.

CodePudding user response:

This is because you drop all row containing NaN. You should replace

dataset = pd.concat([Y, X], axis=1).dropna().iloc[::return_period, :]

with

dataset = pd.concat([Y, X], axis=1).iloc[::return_period, :]

which returns:

           MSFT_pred  GOOGL  DEXJPUS  DEXUSUK  SP500  DJIA  VIXCLS  MSFT_DT  \
2012-01-03       0.04    NaN      NaN      NaN    NaN   NaN     NaN      NaN   
2012-01-10       0.01  -0.07      NaN      NaN    NaN   NaN     NaN      NaN   
2012-01-18       0.05   0.02      NaN      NaN    NaN   NaN     NaN      NaN   
2012-01-25       0.01  -0.11      NaN      NaN    NaN   NaN     NaN      NaN   
2012-02-01       0.03   0.02      NaN      NaN    NaN   NaN     NaN      NaN   
...               ...    ...      ...      ...    ...   ...     ...      ...   
2021-11-23        NaN    NaN     0.00    -0.00  -0.00 -0.01    0.17      NaN   
2021-11-30        NaN    NaN    -0.02    -0.01  -0.03 -0.04    0.34      NaN   
2021-12-07        NaN    NaN     0.00    -0.00   0.03  0.04   -0.22      NaN   
2021-12-14        NaN    NaN     0.00    -0.00  -0.01 -0.00    0.00      NaN   
2021-12-21        NaN    NaN      NaN      NaN   0.00 -0.00     NaN      NaN   

            MSFT_3DT  MSFT_6DT  MSFT_12DT  
2012-01-03       NaN       NaN        NaN  
2012-01-10       NaN       NaN        NaN  
2012-01-18       NaN       NaN        NaN  
2012-01-25       NaN       NaN        NaN  
2012-02-01       NaN       NaN        NaN  
...              ...       ...        ...  
2021-11-23       NaN       NaN        NaN  
2021-11-30       NaN       NaN        NaN  
2021-12-07       NaN       NaN        NaN  
2021-12-14       NaN       NaN        NaN  
2021-12-21       NaN       NaN        NaN  

[512 rows x 11 columns]
​

Do the house-keeping afterwards.

  • Related