Home > Software engineering >  Add a column to a dataframe from another dataframe based on conditions in both dataframes
Add a column to a dataframe from another dataframe based on conditions in both dataframes

Time:10-10

The following data has been extracted from two larger dataframes. df1

{'NID': {7: '212017463',
  8: '212017463',
  9: '212017463',
  10: '212017463',
  11: '212017463',
  12: '212017463',
  13: '212017463'},
 'Lact': {7: 1, 8: 2, 9: 3, 10: 4, 11: 5, 12: 6, 13: 7},
 'M-305': {7: 5018, 8: 5479, 9: 7134, 10: 9046, 11: 8972, 12: 9360, 13: 5295},
 'PI-M': {7: 94, 8: 83, 9: 96, 10: 116, 11: 117, 12: 112, 13: 0}}

df 2

{'NID': {1: '212017463', 2: '212017463', 3: '212017463'},
 'Lact': {1: 7, 2: 7, 3: 7}}

I would like to merge the df1 series M-305 and PI-M from the previous lactation (Lact == 6) with df2 (Lact =7).

if I do a simple merge


df3 = pd.merge(df2, df1[['NID', 'Lact', 'M-305', 'PI-M']], on=['NID', 'Lact'], how='left')
df3

I get M-305 and PI-M merged for lactation (Lact) 7 giving

    
    NID Lact    M-305   PI-M
0   212017463   7   5295    0
1   212017463   7   5295    0
2   212017463   7   5295    0

The output I would like to get is M-305 = 9360 and PI-M = 110 (Lact 6) values

Is there a way to merge lact with lact-1

Thanks

John

CodePudding user response:

IF you are used to SQL and the flexibility of its JOIN operations, then unfortunately Pandas doesn't match those. Pandas only offers equi-join (where the join columns on both frames must match) and limited range join (in the form of merge_asof).

You can calculate the join column manually:

pd.merge(
    df2.assign(Lact_Merge = df2['Lact'] - 1),
    df1[['NID', 'Lact', 'M-305', 'PI-M']],
    left_on=['NID', 'Lact_Merge'],
    right_on=['NID', 'Lact'],
    how='left'
).drop(columns=['Lact_Merge', 'Lact_y']).rename(columns={'Lact_x': 'Lact'})

(I'm using "join" and "merge" interchangably. I have a background SQL Server where such operations are equivalent to INNER JOIN, LEFT JOIN, ...)

  • Related