I have an dataframe like this
sku
FAT-001
FAT-001
FAT-001
FAT-002
FAT-002
I want to create another column depend on the duplicate sku value. The first duplicate value must me empty in dup-sku column. I want to keep only duplicate sku in my dup-sku column. So my expected dataframe will be look like this:
sku dup-sku
FAT-001 #empty
FAT-001 FAT-001
FAT-001 FAT-001
FAT-002 #emty
FAT-002 FAT-002
FAT-003
The first value of duplicate in dup-sku column must be empty
CodePudding user response:
Would this work for your example ?
df['dup']=df['sku']
df['dup'].loc[~df['sku'].duplicated(keep='first')]=''
CodePudding user response:
df['dup-sku']= df.sku.mask(df.sku.duplicated(), '')
df.sku.duplicated() generates a bool series, marking duplicated values, except the first one (default value of keep is just first).
Then it is used as the condition in mask, which sets the empty string (second argment) in elements indicated by True values.
I assume that you want in this rows just the empty string, not NaN as proposed in one of comments.