Have been struggling with this for a bit today. I've got a master dataframe that is missing some values, and a secondary one that has these values which I would like to add in. The key to match on is column 1.
d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df1
1 2
0 Test A
1 Test1 B
2 Test2 C
df2
1 2 3
0 Something z Blah
1 Test None Blah
2 Test1 None Blah
3 Test2 None Blah
4 Test3 x Blah
5 Test4 y Blah
The outcome I'm looking for is:
1 2 3
0 Something z Blah
1 Test A Blah
2 Test1 B Blah
3 Test2 C Blah
4 Test3 x Blah
5 Test4 y Blah
Any ideas?
CodePudding user response:
You can use a map
and fillna
:
df2[2] = df2[2].fillna(df2[1].map(df1.set_index(1)[2]))
Output:
1 2 3
0 Something z Blah
1 Test A Blah
2 Test1 B Blah
3 Test2 C Blah
4 Test3 x Blah
5 Test4 y Blah
CodePudding user response:
usr from this code:
import pandas as pd
df = pd.merge(df2, df1, on='1', how='left')
for i in df.index:
if df['2_x'][i] is None:
df['2_x'][i]=df['2_y'][i]
then you can remove extra column from your dataframe
CodePudding user response:
You can use a pd.merge
and np.where()
to accomplish this
import pandas as pd
import numpy as np
df_merge = pd.merge(df2, df1, how = 'left', left_on = 1, right_on = 1, suffixes=('', '_y'))
df_merge['2'] = np.where(df_merge['2'].isna(), df_merge['2_y'], df_merge['2'])
df_merge = df_merge[[1, '2', 3]]
df_merge
CodePudding user response:
Here is one way about it
df3=df2.merge(df1, on=1, how='left', suffixes=("",'_y') )
df3['2'] = np.where(df3['2'].isna(), df3['2_y'], df3['2'])
df3.drop(columns='2_y')
OR
df3=df2.merge(df1, on=1, how='left', suffixes=("",'_y') )
idx = df3[df3['2'].isnull() == True].index
df3.iloc[idx, 1] = df3.iloc[idx,3]
df3.drop(columns='2_y')
1 2 3
0 Something z Blah
1 Test A Blah
2 Test1 B Blah
3 Test2 C Blah
4 Test3 x Blah
5 Test4 y Blah
CodePudding user response:
Using pandas apply
to multiple columns (ref: Pandas Tricks — Pass Multiple Columns To Lambda | Medium)
d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df1_dict = {k:v for k,v in df1.values}
df2_new = df2.copy()
df2_new[2] = df2_new.apply(lambda x : df1_dict.get(x[1]) if not x[2] else x[2], axis=1)
df2_new
1 2 3
0 Something z Blah
1 Test A Blah
2 Test1 B Blah
3 Test2 C Blah
4 Test3 x Blah
5 Test4 y Blah