Home > Net >  Append column from one dataframe to another for rows that match in both dataframes
Append column from one dataframe to another for rows that match in both dataframes

Time:11-15

I have two dataframes A and B that contain different sets of patient data, and need to append certain columns from B to A - however only for those rows that contain information from the same patient and visit, i.e. where A and B have a matching value in two particular columns. B is longer than A, not all rows in A are contained in B. I don't know how this would be possible without looping, but many people discourage from looping over pandas dataframes (apart from the fact that my loop solution does not work because "Can only compare identically-labeled Series objects"). I read the options here How to iterate over rows in a DataFrame in Pandas but don't see which one I could apply here and would appreciate any tips!

Toy example (the actual dataframe has about 300 rows):


dict_A = {
    'ID': ['A_190792','X_210392','B_050490','F_311291','K_010989'],
    'Visit_Date': ['2010-10-31','2011-09-24','2010-30-01','2012-01-01','2013-08-13'],
    'Score': [30, 23, 42, 23, 31],
}
A = pd.DataFrame(dict_A) 


dict_B = {
    'ID': ['G_090891','A_190792','Z_060791','X_210392','B_050490','F_311291','K_010989','F_230989'],
    'Visit_Date': ['2013-03-01','2010-10-31','2013-04-03','2011-09-24','2010-30-01','2012-01-01','2013-08-13','2014-09-09'],
    'Diagnosis': ['F12', 'G42', 'F34', 'F90', 'G98','G87','F23','G42'],
}
B = pd.DataFrame(dict_B) 


for idx, row in A.iterrows():
    A.loc[row,'Diagnosis'] = B['Diagnosis'][(B['Visit_Date']==A['Visit_Date']) & (B['ID']==A['ID'])]
# Appends Diagnosis column from B to A for rows where ID and date match

I have seen this question Append Columns to Dataframe 1 Based on Matching Column Values in Dataframe 2 but the only answer is quite specific to it and also does not address the question whether a loop can/should be used or not

CodePudding user response:

i think you can use merge:

A['Visit_Date']=pd.to_datetime(A['Visit_Date'])
B['Visit_Date']=pd.to_datetime(B['Visit_Date'])
final=A.merge(B,on=['Visit_Date','ID'],how='outer')
print(final)
'''

         ID  Visit_Date  Score Diagnosis
0  A_190792  2010-10-31   30.0       G42
1  X_210392  2011-09-24   23.0       F90
2  B_050490  2010-30-01   42.0       G98
3  F_311291  2012-01-01   23.0       G87
4  K_010989  2013-08-13   31.0       F23
5  G_090891  2013-03-01    NaN       F12
6  Z_060791  2013-04-03    NaN       F34
7  F_230989  2014-09-09    NaN       G42
'''

if you want to only A:

A['Visit_Date']=pd.to_datetime(A['Visit_Date'])
B['Visit_Date']=pd.to_datetime(B['Visit_Date'])
final=A.merge(B,on=['Visit_Date','ID'],how='left')
print(final)
'''
         ID  Visit_Date  Score Diagnosis
0  A_190792  2010-10-31     30       G42
1  X_210392  2011-09-24     23       F90
2  B_050490  2010-30-01     42       G98
3  F_311291  2012-01-01     23       G87
4  K_010989  2013-08-13     31       F23
'''
  • Related