Home > Software engineering >  python pandas how to copy duplicate value to new Column and keep the first duplicate value empty?
python pandas how to copy duplicate value to new Column and keep the first duplicate value empty?

Time:05-02

I have an dataframe like this

  sku
FAT-001
FAT-001
FAT-001
FAT-002
FAT-002

I want to create another column depend on the duplicate sku value. The first duplicate value must me empty in dup-sku column. I want to keep only duplicate sku in my dup-sku column. So my expected dataframe will be look like this:

  sku        dup-sku
FAT-001      #empty 
FAT-001      FAT-001
FAT-001      FAT-001
FAT-002      #emty  
FAT-002      FAT-002
FAT-003      

The first value of duplicate in dup-sku column must be empty

CodePudding user response:

Would this work for your example ?

df['dup']=df['sku']
df['dup'].loc[~df['sku'].duplicated(keep='first')]=''

CodePudding user response:

df['dup-sku']= df.sku.mask(df.sku.duplicated(), '')

df.sku.duplicated() generates a bool series, marking duplicated values, except the first one (default value of keep is just first).

Then it is used as the condition in mask, which sets the empty string (second argment) in elements indicated by True values.

I assume that you want in this rows just the empty string, not NaN as proposed in one of comments.

  • Related