Home > other >  Combining two dataframes (one should have a downscaled date, and other has a daily date)
Combining two dataframes (one should have a downscaled date, and other has a daily date)

Time:01-14

I have two dataframes df1 and df2. df1 has a daily time frame (with hours, minutes and seconds) and the other has a monthly time frame as shown below:

df1:

Index                    Value
2016-01-01 00:00:00     99
2016-01-02 00:00:00     105
2016-01-03 00:00:00      68

df2 (which needs downscaling to daily):

Index         Date            Value1           .............................Value 10
 0              31/01/2016        15                 ............................ 98                     
 1              28/02/2016        25                 ............................ 92
 2              31/03/2016        58                 ............................ 68

I want to have an output like this:

Index            Date       Value   Value1     ..........................Value 10
     0  2016-01-01 00:00:00    99   15         ......................... 98   
     1  2016-01-02 00:00:00    105  15         ......................... 98
     2  2016-01-03 00:00:00    68   15         ......................... 98

etc (for february and march and so on)

Data for replication:

df1 = pd.DataFrame({'Index': ['2016-01-01 00:00:00', '2016-01-02 00:00:00', '2016-01-03 00:00:00'],
                    'Value': [99, 105, 68]})

df2 = pd.DataFrame({'Index': [0, 1, 2],
                    'Date': ['31/01/2016', '28/02/2016', '31/03/2016'],
                    'Value1': [15, 25, 58],
                    'Value10': [98, 92, 68]})

CodePudding user response:

You can create a dummy "Month" (using pd.to_datetime and convert it to 'datetime64[M]' objects) column for each DataFrame and merge (with how parameter set to 'outer' because we don't want to lose any data) them on it. Finally, remove unwanted columns.

df2['Month'] = pd.to_datetime(df2['Date']).astype('datetime64[M]')
df1['Month'] = pd.to_datetime(df1['Index']).astype('datetime64[M]')

out = (df1.merge(df2, on='Month', how='outer')
       .drop(['Month', 'Date', 'Index_y'], axis=1)
       .rename(columns={'Index_x':'Date'}))

Output:

                  Date  Value  Value1  Value10
0  2016-01-01 00:00:00   99.0      15       98
1  2016-01-02 00:00:00  105.0      15       98
2  2016-01-03 00:00:00   68.0      15       98
3                  NaN    NaN      25       92
4                  NaN    NaN      58       68
  •  Tags:  
  • Related