Dataframe 1 :
id status
A Pass
A P_Pass
A C_Pass
B Fail
B A_Fail
Dataframe 2 :
id Category group
A pxe 1
B fxe 2
After merging the Dataframe 2 on Dataframe 1 with left join, final Dataframe becomes :
id status Category group
A Pass pxe 1
A P_Pass pxe 1
A C_Pass pxe 1
B Fail fxe 2
B A_Fail fxe 2
Expected Dataframe is :
id status Category group
A Pass pxe 1
A P_Pass - -
A C_Pass - -
B Fail fxe 2
B A_Fail - -
i want to mask the duplicate entries and make it appear only once
code used ,
df_final = pd.merge(df_1 , df_2 , on = 'id' , how = 'left')
CodePudding user response:
you use np.where and cumcount after merging:
#df_final
id status Category group
0 A Pass pxe 1
1 A P_Pass pxe 1
2 A C_Pass pxe 1
3 B Fail fxe 2
4 B A_Fail fxe 2
df_final['Category'] = np.where(df_final.groupby('Category').cumcount().eq(0), df_final['Category'], '-')
df_final['group'] = np.where(df_final.groupby('group').cumcount().eq(0), df_final['group'], '-')
#-->>
#df_final
id status Category group
0 A Pass pxe 1
1 A P_Pass - -
2 A C_Pass - -
3 B Fail fxe 2
4 B A_Fail - -
CodePudding user response:
here is one way to do it
do the merge as usual, and then using mask, compare the row with previous row and replace duplicate entries with the hyphen
df=df[['id','status']].merge(df2 ,
on = 'id' ,
how = 'left')
df[['Category','group']]=df[['Category','group']].mask(df['Category'].eq(df['Category'].shift(1)), '-' )
df
id status Category group
0 A Pass pxe 1
1 A P_Pass - -
2 A C_Pass - -
3 B Fail fxe 2
4 B A_Fail - -