Home > Mobile >  Drop duplicates when for a group a string present more than once in a column-pandas
Drop duplicates when for a group a string present more than once in a column-pandas

Time:07-20

Is there a way to groupby based on 2 columns (Id, Name) in a dataframe and if the presence of a certain string "x_1" in the column "Name" is more than once, then just keep the first row (first occurrence)?

Id Name Value
1  x_1  23
1  x_2  24
1  x_1  23
1  x_3  27
1  x_4  28
1  x_3  29
1  x_4  30

Desired output

   Id Name Value
    1  x_1  23
    1  x_2  24
    1  x_3  27
    1  x_4  28
    1  x_3  29
    1  x_4  30

This removes x_3,x_4 rows as well which I want to keep: df.drop_duplicates(subset = ['Id', 'Name'],keep = 'first')

CodePudding user response:

Let us use duplicated

df[~(df.duplicated('Id') & df['Name'].eq('x_1'))]

   Id Name  Value
0   1  x_1     23
1   1  x_2     24
3   1  x_3     27
4   1  x_4     28
5   1  x_3     29
6   1  x_4     30
  • Related