I have a dataset df
that looks like this:
mandant posnr aufdat
9 120300001 13.01.20 00:00
9 180200017 13.02.20 00:00
9 180300010 03.02.20 00:00
I want to join this with another dataset present in a dictionary newDict['second']
that looks like this:
mandant nr jahr
0 1032 2016
0 1291 2016
0 5442 2016
0 5502 2016
Normally, I would do an inner join like this:
df_merged = df.merge(newDict['second'],
left_on=['nrkreis_nr'],
right_on=['nr'],
how='inner')
However, now I need to perform a join using two conditions. The first is mentioned above and the second is
newDict['second'].jahr = year of (df.aufdat)
Values from the df.aufdat
col loook like this originally 13.01.20 00:00
. How can I use two join conditions together in this case, especially because I need to join on the basis of a substring and not the full string?
CodePudding user response:
Create new column called jahr
in df
by converting to datetimes and Series.dt.year
and add this columns in both left_on
and right_on
parameters:
df_merged = (df.assign(jahr = pd.to_datetime(df['aufdat'], format='%d.%m.%y %H:%M').dt.year)
.merge(newDict['second'],
left_on=['nrkreis_nr','jahr'],
right_on=['nr','jahr']))