I have a dataset of hourly prices where I have produced a dataframe that contains the minimum price from the previous day using:
df_min = df_hour_0[['Price_REG1', 'Price_REG2', 'Price_REG3',
'Price_REG4']].between_time('00:00', '23:00').resample('d').min()
This gives me:
Price_REG1 Price_REG2 Price_REG3 Price_REG4
date
2020-01-01 00:00:00 25.07 25.07 25.07 25.07
2020-01-02 00:00:00 12.07 12.07 12.07 12.07
2020-01-03 00:00:00 0.14 0.14 0.14 0.14
2020-01-04 00:00:00 3.83 3.83 3.83 3.83
2020-01-05 00:00:00 25.77 25.77 25.77 25.77
Now, I want to combine this df with 24 other df's, one for each hour (hour_0 below):
Price_REG1 Price_REG2 ... Price_24_3 Price_24_4
date ...
2020-01-01 00:00:00 30.83 30.83 ... NaN NaN
2020-01-02 00:00:00 24.81 24.81 ... 25.88 25.88
2020-01-03 00:00:00 24.39 24.39 ... 27.69 27.69
2020-01-04 00:00:00 22.04 22.04 ... 25.70 25.70
2020-01-05 00:00:00 25.77 25.77 ... 27.37 27.37
Which I do this way:
df_hour_0 = pd.concat([df_hour_0, df_min, df_max], axis=1)
This works fine for the df from the first hour, since the dates matches. But for the other df's the date is "2020-01-01 00:01:00", "2020-01-01 00:02:00" etc.
Since the dates don't match, the pd.concat gives me two times as many observations where every other observation is null:
Price_REG1 Price_REG2 ... Price_3_min Price_4_min
date ...
2020-01-01 00:00:00 NaN NaN ... NaN NaN
2020-01-01 01:00:00 28.78 28.78 ... NaN NaN
2020-01-02 00:00:00 NaN NaN ... 30.83 30.83
2020-01-02 01:00:00 12.07 12.07 ... NaN NaN
2020-01-03 00:00:00 NaN NaN ... 31.20 31.20
I tried to fix this by:
df_max = df_max.reset_index()
df_max = df_max.drop(['date'], axis=1)
But this only gives me the same issue but instead of every other being null the whole df_min df is just inserted at the bottom of the first df.
I want to keep the date, otherwise I guess it could be possible to reset the index in both df's and combine them by index instead of date.
Thank you.
CodePudding user response:
One option could be to normalize to the date:
dfs = [df_hour_0, df_min, df_max]
pd.concat([d.set_axis(d.index.normalize()) for d in dfs], axis=1)