Home > Blockchain >  Merging two DataFrames with date and other column
Merging two DataFrames with date and other column

Time:03-22

I am trying to merge two DataFrame on Date and column called IDV. Here is my first DataFrame

df1

Date IDV Values
01/01/2020 Var1 100
01/01/2020 Var2 500
01/01/2020 Var3 600
01/01/2020 Var4 10
01/01/2020 Var5 10

df2

Date IDV Values
01/01/2019 Var2 110
01/01/2019 Var3 510
01/01/2019 Var1 300
01/01/2019 Var5 20
01/01/2019 Var4 20

my desired output would be

Date IDV Values Last_Year_Values
01/01/2020 Var1 100 300
01/01/2020 Var2 500 110
01/01/2020 Var3 600 510
01/01/2020 Var4 10 20
01/01/2020 Var5 10 20

I tried pd.merge(df1,df2,left_on ='date',right_on ='IDV', how = 'left')

CodePudding user response:

Use DateOffset if Date is already datetime64:

cols = ['Date', 'IDV', 'Values']
out = df1.merge(df2[cols].assign(Date=df2['Date'] pd.DateOffset(years=1)), 
                on=['Date', 'IDV'], how='left', suffixes=('', '_last_year'))
print(out)

# Output
        Date   IDV  Values  Values_last_year
0 2020-01-01  Var1     100               300
1 2020-01-01  Var2     500               110
2 2020-01-01  Var3     600               510
3 2020-01-01  Var4      10                20
4 2020-01-01  Var5      10                20

Setup:

import pandas as pd

d1 = {'Date': [pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01'),
               pd.Timestamp('2020-01-01')],
              'IDV': ['Var1', 'Var2', 'Var3', 'Var4', 'Var5'],
              'Values': [100, 500, 600, 10, 10]}
df1 = pd.DataFrame(d1)

d2 = {'Date': [pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01'),
               pd.Timestamp('2019-01-01')],
              'IDV': ['Var2', 'Var3', 'Var1', 'Var5', 'Var4'],
              'Values': [110, 510, 300, 20, 20]}
df2 = pd.DataFrame(d2)

CodePudding user response:

Assuming a string type, a simple method would be to change the year in df2:

pd.merge(df1,
         df2.assign(Date=df2['Date'].str.replace('2019', '2020')),
         on=['Date', 'IDV'],
         how='left', suffixes=('', '_last_year'))

or for a more generic method (works with any year):

pd.merge(df1,
         df2.assign(Date=df2['Date'].str.replace(r'\d $', lambda m: str(int(m.group(0)) 1), regex=True)),
         on=['Date', 'IDV'],
         how='left', suffixes=('', '_last_year'))

output:

         Date   IDV  Values  Values_last_year
0  01/01/2020  Var1     100               300
1  01/01/2020  Var2     500               110
2  01/01/2020  Var3     600               510
3  01/01/2020  Var4      10                20
4  01/01/2020  Var5      10                20
  • Related