Home > Back-end >  join data using substring from a column
join data using substring from a column

Time:10-07

I have a dataset df that looks like this:

mandant posnr   aufdat
9   120300001   13.01.20 00:00
9   180200017   13.02.20 00:00
9   180300010   03.02.20 00:00

I want to join this with another dataset present in a dictionary newDict['second']that looks like this:

mandant nr  jahr
0   1032    2016
0   1291    2016
0   5442    2016
0   5502    2016

Normally, I would do an inner join like this:

df_merged = df.merge(newDict['second'], 
                                        left_on=['nrkreis_nr'],
                                        right_on=['nr'],
                                        how='inner')

However, now I need to perform a join using two conditions. The first is mentioned above and the second is

newDict['second'].jahr = year of (df.aufdat)

Values from the df.aufdat col loook like this originally 13.01.20 00:00. How can I use two join conditions together in this case, especially because I need to join on the basis of a substring and not the full string?

CodePudding user response:

Create new column called jahr in df by converting to datetimes and Series.dt.year and add this columns in both left_on and right_on parameters:

df_merged = (df.assign(jahr = pd.to_datetime(df['aufdat'], format='%d.%m.%y %H:%M').dt.year)
              .merge(newDict['second'], 
                     left_on=['nrkreis_nr','jahr'],
                     right_on=['nr','jahr']))
  • Related