I have a dataframe and i want to make a new dataframe containing features from the old dataframe.
Here is some dummy code:
import pandas as pd
import numpy as np
df_msft = [['2020-1-1', 10, 11], ['2020-1-2', 15, 20], ['2020-1-3', 14, 12]]
df1 = pd.DataFrame(df_msft , columns = ['datetime', 'price_open', 'price_close'])
#Making a 'features' dataframe which will contain the features of the stocks in question
features = pd.DataFrame(index=df1.datetime).sort_index()
features['daily_change'] = df1.price_close/df1.price_open-1 # daily return
features['pct_change_on_day'] = df1.price_open/df1.price_close.shift(1)-1
When i do this my 'features' dataframe is filled with NaN values, does anyone know why this is?
CodePudding user response:
Use:
features.loc[:, 'daily_change'] = (df1.price_close/df1.price_open)-1
CodePudding user response:
Your feature
and df1
dataframes have different indexes. So when assigning, no alignment is found and your values are discarded.
You can overcome this by assigning the numpy array:
features['daily_change'] = (df1.price_close/df1.price_open-1).values # daily return
features['pct_change_on_day'] = (df1.price_open/df1.price_close.shift(1)-1).values
output:
daily_change pct_change_on_day
datetime
2020-1-1 0.100000 NaN
2020-1-2 0.333333 0.363636
2020-1-3 -0.142857 -0.300000
or, better, make "datetime" the index of df1
:
df1 = df1.set_index('datetime')
features['daily_change'] = (df1.price_close/df1.price_open-1) # daily return
features['pct_change_on_day'] = (df1.price_open/df1.price_close.shift(1)-1)