I have two dataframes df1 and df2. df1 has a daily time frame (with hours, minutes and seconds) and the other has a monthly time frame as shown below:
df1:
Index Value
2016-01-01 00:00:00 99
2016-01-02 00:00:00 105
2016-01-03 00:00:00 68
df2 (which needs downscaling to daily):
Index Date Value1 .............................Value 10
0 31/01/2016 15 ............................ 98
1 28/02/2016 25 ............................ 92
2 31/03/2016 58 ............................ 68
I want to have an output like this:
Index Date Value Value1 ..........................Value 10
0 2016-01-01 00:00:00 99 15 ......................... 98
1 2016-01-02 00:00:00 105 15 ......................... 98
2 2016-01-03 00:00:00 68 15 ......................... 98
etc (for february and march and so on)
Data for replication:
df1 = pd.DataFrame({'Index': ['2016-01-01 00:00:00', '2016-01-02 00:00:00', '2016-01-03 00:00:00'],
'Value': [99, 105, 68]})
df2 = pd.DataFrame({'Index': [0, 1, 2],
'Date': ['31/01/2016', '28/02/2016', '31/03/2016'],
'Value1': [15, 25, 58],
'Value10': [98, 92, 68]})
CodePudding user response:
You can create a dummy "Month" (using pd.to_datetime
and convert it to 'datetime64[M]'
objects) column for each DataFrame and merge
(with how
parameter set to 'outer'
because we don't want to lose any data) them on it. Finally, remove unwanted columns.
df2['Month'] = pd.to_datetime(df2['Date']).astype('datetime64[M]')
df1['Month'] = pd.to_datetime(df1['Index']).astype('datetime64[M]')
out = (df1.merge(df2, on='Month', how='outer')
.drop(['Month', 'Date', 'Index_y'], axis=1)
.rename(columns={'Index_x':'Date'}))
Output:
Date Value Value1 Value10
0 2016-01-01 00:00:00 99.0 15 98
1 2016-01-02 00:00:00 105.0 15 98
2 2016-01-03 00:00:00 68.0 15 98
3 NaN NaN 25 92
4 NaN NaN 58 68