Home > OS >  Merging pandas dataframes to fill in the gaps
Merging pandas dataframes to fill in the gaps

Time:06-22

Have been struggling with this for a bit today. I've got a master dataframe that is missing some values, and a secondary one that has these values which I would like to add in. The key to match on is column 1.

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1
       1  2
0   Test  A
1  Test1  B
2  Test2  C

df2
           1     2     3
0  Something     z  Blah
1       Test  None  Blah
2      Test1  None  Blah
3      Test2  None  Blah
4      Test3     x  Blah
5      Test4     y  Blah


The outcome I'm looking for is:

           1     2     3
0  Something     z  Blah
1       Test     A  Blah
2      Test1     B  Blah
3      Test2     C  Blah
4      Test3     x  Blah
5      Test4     y  Blah

Any ideas?

CodePudding user response:

You can use a map and fillna:

df2[2] = df2[2].fillna(df2[1].map(df1.set_index(1)[2]))

Output:

          1  2     3
0  Something  z  Blah
1       Test  A  Blah
2      Test1  B  Blah
3      Test2  C  Blah
4      Test3  x  Blah
5      Test4  y  Blah

CodePudding user response:

usr from this code:

import pandas as pd
df = pd.merge(df2, df1, on='1', how='left')
for i in df.index:
    if df['2_x'][i] is None:
        df['2_x'][i]=df['2_y'][i]

then you can remove extra column from your dataframe

CodePudding user response:

You can use a pd.merge and np.where() to accomplish this

import pandas as pd
import numpy as np

df_merge = pd.merge(df2, df1, how = 'left', left_on = 1, right_on = 1, suffixes=('', '_y'))
df_merge['2'] = np.where(df_merge['2'].isna(), df_merge['2_y'], df_merge['2'])
df_merge = df_merge[[1, '2', 3]]
df_merge

CodePudding user response:

Here is one way about it

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
df3['2'] = np.where(df3['2'].isna(), df3['2_y'], df3['2'])
df3.drop(columns='2_y')

OR

df3=df2.merge(df1, on=1, how='left',  suffixes=("",'_y') )
idx = df3[df3['2'].isnull() == True].index
df3.iloc[idx, 1]  = df3.iloc[idx,3]
df3.drop(columns='2_y')
         1      2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah

CodePudding user response:

Using pandas apply to multiple columns (ref: Pandas Tricks — Pass Multiple Columns To Lambda | Medium)

d1 = {1:['Test','Test1','Test2'], 2:['A','B','C']}
d2 = {1:['Something','Test','Test1','Test2','Test3','Test4'], 2:['z',None,None,None,'x','y'],3:['Blah','Blah','Blah','Blah','Blah','Blah']}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1_dict = {k:v for k,v in df1.values}
df2_new = df2.copy()
df2_new[2] = df2_new.apply(lambda x : df1_dict.get(x[1]) if not x[2] else x[2], axis=1)
df2_new 

    1   2   3
0   Something   z   Blah
1   Test        A   Blah
2   Test1       B   Blah
3   Test2       C   Blah
4   Test3       x   Blah
5   Test4       y   Blah
  • Related