Home > Enterprise >  mask duplicate entries while merging in pandas
mask duplicate entries while merging in pandas

Time:09-19

Dataframe 1 :

id    status   
A     Pass          
A     P_Pass       
A     C_Pass 
B     Fail
B     A_Fail

Dataframe 2 :

id    Category     group
A        pxe         1
B        fxe         2

After merging the Dataframe 2 on Dataframe 1 with left join, final Dataframe becomes :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass        pxe      1
A     C_Pass        pxe      1
B     Fail          fxe      2
B     A_Fail        fxe      2

Expected Dataframe is :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass         -       -
A     C_Pass         -       -
B     Fail          fxe      2
B     A_Fail         -       -

i want to mask the duplicate entries and make it appear only once

code used ,

df_final = pd.merge(df_1 , df_2 , on = 'id' , how = 'left')

CodePudding user response:

you use np.where and cumcount after merging:

#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  pxe         1
2   A   C_Pass  pxe         1
3   B   Fail    fxe         2
4   B   A_Fail  fxe         2

df_final['Category'] = np.where(df_final.groupby('Category').cumcount().eq(0), df_final['Category'], '-')
df_final['group'] = np.where(df_final.groupby('group').cumcount().eq(0), df_final['group'], '-')

#-->>
#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  -           -
2   A   C_Pass  -           -
3   B   Fail    fxe         2
4   B   A_Fail  -           -

CodePudding user response:

here is one way to do it

do the merge as usual, and then using mask, compare the row with previous row and replace duplicate entries with the hyphen

df=df[['id','status']].merge(df2 , 
         on = 'id' , 
         how = 'left')

df[['Category','group']]=df[['Category','group']].mask(df['Category'].eq(df['Category'].shift(1)), '-' )
df

        id  status  Category    group
0        A  Pass         pxe    1
1        A  P_Pass       -      -
2        A  C_Pass       -      -
3        B  Fail         fxe    2
4        B  A_Fail       -      -
  • Related