I have the following problem: i am trying to join df1 = ['ID, 'Earnings', 'WC, 'Year']
and df2 = ['ID', 'F1_Earnings', 'df2_year']
. So for example: the 'F1_Earnings' of a particular company, e.g. with ID = 1
and year = 1996
, in df2 (aka. the Forward Earnings) should get joined on df1 in a way that they show up in df1 under ID = 1
and year = 1995
.
I have no clue how to specify a join on two conditions, of course they need to join on "ID", but how do I add a second condition which specifies that they also join on "df1_year = df2_year - 1"?
d1 = {'ID': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], 'Earnings': [100, 200, 400, 250, 300, 350, 400, 550, 700, 259, 300, 350], 'WC': [20, 40, 35, 55, 60, 65, 30, 28, 32, 45, 60, 52], 'Year': [1995, 1996, 1997, 1996, 1997, 1998, 1995, 1997, 1998, 1996, 1997, 1998]}
df1 = pd.DataFrame(data=d1)
d2 = {'ID': [1, 2, 3, 4], 'F1_Earnings': [120, 220, 420, 280], 'WC': [23, 37, 40, 52], 'Year': [1996, 1997, 1998, 1999]}
df2 = pd.DataFrame(data=d2)
I did the following, but I guess there miust be a smarter way? I am afraid it wont work for larger datasets...:
df3 = pd.merge(df1, df2, how='left', on = 'ID')
df3.loc[df3['Year_x'] == df3['Year_y'] - 1]
CodePudding user response:
You can use a Series as key in merge
:
df1.merge(df2, how='left',
left_on=['ID', 'Year'],
right_on=['ID', df2['Year'].sub(1)])
output:
ID Year Earnings WC_x Year_x F1_Earnings WC_y Year_y
0 1 1995 100 20 1995 120.0 23.0 1996.0
1 1 1996 200 40 1996 NaN NaN NaN
2 1 1997 400 35 1997 NaN NaN NaN
3 2 1996 250 55 1996 220.0 37.0 1997.0
4 2 1997 300 60 1997 NaN NaN NaN
5 2 1998 350 65 1998 NaN NaN NaN
6 3 1995 400 30 1995 NaN NaN NaN
7 3 1997 550 28 1997 420.0 40.0 1998.0
8 3 1998 700 32 1998 NaN NaN NaN
9 4 1996 259 45 1996 NaN NaN NaN
10 4 1997 300 60 1997 NaN NaN NaN
11 4 1998 350 52 1998 280.0 52.0 1999.0
Or change the Year to Year-1, before the merge
:
df1.merge(df2.assign(Year=df2['Year'].sub(1)),
how='left', on=['ID', 'Year'])
output:
ID Earnings WC_x Year F1_Earnings WC_y
0 1 100 20 1995 120.0 23.0
1 1 200 40 1996 NaN NaN
2 1 400 35 1997 NaN NaN
3 2 250 55 1996 220.0 37.0
4 2 300 60 1997 NaN NaN
5 2 350 65 1998 NaN NaN
6 3 400 30 1995 NaN NaN
7 3 550 28 1997 420.0 40.0
8 3 700 32 1998 NaN NaN
9 4 259 45 1996 NaN NaN
10 4 300 60 1997 NaN NaN
11 4 350 52 1998 280.0 52.0