mask duplicate entries while merging in pandas-CodePudding

Dataframe 1 :

id    status   
A     Pass          
A     P_Pass       
A     C_Pass 
B     Fail
B     A_Fail

Dataframe 2 :

id    Category     group
A        pxe         1
B        fxe         2

After merging the Dataframe 2 on Dataframe 1 with left join, final Dataframe becomes :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass        pxe      1
A     C_Pass        pxe      1
B     Fail          fxe      2
B     A_Fail        fxe      2

Expected Dataframe is :

id    status    Category   group
A     Pass          pxe      1
A     P_Pass         -       -
A     C_Pass         -       -
B     Fail          fxe      2
B     A_Fail         -       -

i want to mask the duplicate entries and make it appear only once

code used ,

df_final = pd.merge(df_1 , df_2 , on = 'id' , how = 'left')

CodePudding user response：

you use np.where and cumcount after merging:

#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  pxe         1
2   A   C_Pass  pxe         1
3   B   Fail    fxe         2
4   B   A_Fail  fxe         2

df_final['Category'] = np.where(df_final.groupby('Category').cumcount().eq(0), df_final['Category'], '-')
df_final['group'] = np.where(df_final.groupby('group').cumcount().eq(0), df_final['group'], '-')

#-->>
#df_final
    id  status  Category    group
0   A   Pass    pxe         1
1   A   P_Pass  -           -
2   A   C_Pass  -           -
3   B   Fail    fxe         2
4   B   A_Fail  -           -

CodePudding user response：

here is one way to do it

do the merge as usual, and then using mask, compare the row with previous row and replace duplicate entries with the hyphen

df=df[['id','status']].merge(df2 , 
         on = 'id' , 
         how = 'left')

df[['Category','group']]=df[['Category','group']].mask(df['Category'].eq(df['Category'].shift(1)), '-' )
df

        id  status  Category    group
0        A  Pass         pxe    1
1        A  P_Pass       -      -
2        A  C_Pass       -      -
3        B  Fail         fxe    2
4        B  A_Fail       -      -