Combining dataframes with differing dates column-CodePudding

I have a dataset of hourly prices where I have produced a dataframe that contains the minimum price from the previous day using:

df_min = df_hour_0[['Price_REG1', 'Price_REG2', 'Price_REG3', 
                 'Price_REG4']].between_time('00:00', '23:00').resample('d').min()

This gives me:

                       Price_REG1  Price_REG2  Price_REG3  Price_REG4
date                                                      
2020-01-01 00:00:00     25.07       25.07       25.07       25.07
2020-01-02 00:00:00     12.07       12.07       12.07       12.07
2020-01-03 00:00:00     0.14        0.14        0.14        0.14
2020-01-04 00:00:00     3.83        3.83        3.83        3.83
2020-01-05 00:00:00     25.77       25.77       25.77       25.77

Now, I want to combine this df with 24 other df's, one for each hour (hour_0 below):

                     Price_REG1  Price_REG2  ...  Price_24_3  Price_24_4
date                                         ...                        
2020-01-01 00:00:00       30.83       30.83  ...         NaN         NaN
2020-01-02 00:00:00       24.81       24.81  ...       25.88       25.88
2020-01-03 00:00:00       24.39       24.39  ...       27.69       27.69
2020-01-04 00:00:00       22.04       22.04  ...       25.70       25.70
2020-01-05 00:00:00       25.77       25.77  ...       27.37       27.37

Which I do this way:

df_hour_0 = pd.concat([df_hour_0, df_min, df_max], axis=1)

This works fine for the df from the first hour, since the dates matches. But for the other df's the date is "2020-01-01 00:01:00", "2020-01-01 00:02:00" etc.

Since the dates don't match, the pd.concat gives me two times as many observations where every other observation is null:

                     Price_REG1  Price_REG2  ...  Price_3_min  Price_4_min
date                                         ...                          
2020-01-01 00:00:00         NaN         NaN  ...          NaN          NaN
2020-01-01 01:00:00       28.78       28.78  ...          NaN          NaN
2020-01-02 00:00:00         NaN         NaN  ...        30.83        30.83
2020-01-02 01:00:00       12.07       12.07  ...          NaN          NaN
2020-01-03 00:00:00         NaN         NaN  ...        31.20        31.20

I tried to fix this by:

df_max = df_max.reset_index()
df_max = df_max.drop(['date'], axis=1)

But this only gives me the same issue but instead of every other being null the whole df_min df is just inserted at the bottom of the first df.

I want to keep the date, otherwise I guess it could be possible to reset the index in both df's and combine them by index instead of date.

Thank you.

CodePudding user response：

One option could be to normalize to the date:

dfs = [df_hour_0, df_min, df_max]
pd.concat([d.set_axis(d.index.normalize()) for d in dfs], axis=1)