I am trying to view the predicted price for MSFT and I am referring to a book titled 'Machine learning and data science blueprints for finance'. It provides a sample code as a case study to determine the future stock price of MSFT. The code is below. However, when I start debugging it, the terminal shows the following message
Empty DataFrame
Columns: [MSFT_pred, GOOGL, DEXJPUS, DEXUSUK, SP500, DJIA, VIXCLS, MSFT_DT, MSFT_3DT,
MSFT_6DT, MSFT_12DT]
The code is below
from pandas_datareader import data
import yfinance as yf
import pandas_datareader as web
import numpy as np
import pandas as pd
return_period = 5
stk_tickers = ['MSFT','AAPL','GOOGL']
stk_data = yf.download(stk_tickers, start = '2012-01-01', end='2017-01-01')
ccy_tickers = ['DEXJPUS','DEXUSUK']
ccy_data = web.DataReader(ccy_tickers,'fred')
idx_tickers = ['SP500','DJIA','VIXCLS']
idx_data = web.DataReader(idx_tickers,'fred')
Y = np.log(stk_data.loc[:, ('Adj Close', 'MSFT')]).diff(return_period).shift(-
return_period)
Y.name = Y.name[-1] '_pred'
X1 = np.log(stk_data.loc[:, ('Adj Close', ('GOOGL', 'IBM'))]).diff(return_period)
X1.columns = X1.columns.droplevel()
X2 = np.log(ccy_data).diff(return_period)
X3 = np.log(idx_data).diff(return_period)
X4 = pd.concat([np.log(stk_data.loc[:, ('Adj Close', 'MSFT')]).diff(i) for i in
[return_period, return_period*3, return_period*6, return_period*12]],
axis=1).dropna()
X4.columns = ['MSFT_DT', 'MSFT_3DT', 'MSFT_6DT', 'MSFT_12DT']
X = pd.concat([X1, X2, X3, X4], axis=1)
dataset = pd.concat([Y, X], axis=1).dropna().iloc[::return_period, :]
Y = dataset.loc[:, Y.name]
X = dataset.loc[:, X.columns]
print(dataset)
I know the data is present because when I print 'X' or 'Y' it shows up with it. It must be an issue with concat
.
CodePudding user response:
This is because you drop all row containing NaN
. You should replace
dataset = pd.concat([Y, X], axis=1).dropna().iloc[::return_period, :]
with
dataset = pd.concat([Y, X], axis=1).iloc[::return_period, :]
which returns:
MSFT_pred GOOGL DEXJPUS DEXUSUK SP500 DJIA VIXCLS MSFT_DT \
2012-01-03 0.04 NaN NaN NaN NaN NaN NaN NaN
2012-01-10 0.01 -0.07 NaN NaN NaN NaN NaN NaN
2012-01-18 0.05 0.02 NaN NaN NaN NaN NaN NaN
2012-01-25 0.01 -0.11 NaN NaN NaN NaN NaN NaN
2012-02-01 0.03 0.02 NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
2021-11-23 NaN NaN 0.00 -0.00 -0.00 -0.01 0.17 NaN
2021-11-30 NaN NaN -0.02 -0.01 -0.03 -0.04 0.34 NaN
2021-12-07 NaN NaN 0.00 -0.00 0.03 0.04 -0.22 NaN
2021-12-14 NaN NaN 0.00 -0.00 -0.01 -0.00 0.00 NaN
2021-12-21 NaN NaN NaN NaN 0.00 -0.00 NaN NaN
MSFT_3DT MSFT_6DT MSFT_12DT
2012-01-03 NaN NaN NaN
2012-01-10 NaN NaN NaN
2012-01-18 NaN NaN NaN
2012-01-25 NaN NaN NaN
2012-02-01 NaN NaN NaN
... ... ... ...
2021-11-23 NaN NaN NaN
2021-11-30 NaN NaN NaN
2021-12-07 NaN NaN NaN
2021-12-14 NaN NaN NaN
2021-12-21 NaN NaN NaN
[512 rows x 11 columns]
Do the house-keeping afterwards.