Is there a way to groupby based on 2 columns (Id, Name) in a dataframe and if the presence of a certain string "x_1" in the column "Name" is more than once, then just keep the first row (first occurrence)?
Id Name Value
1 x_1 23
1 x_2 24
1 x_1 23
1 x_3 27
1 x_4 28
1 x_3 29
1 x_4 30
Desired output
Id Name Value
1 x_1 23
1 x_2 24
1 x_3 27
1 x_4 28
1 x_3 29
1 x_4 30
This removes x_3,x_4 rows as well which I want to keep: df.drop_duplicates(subset = ['Id', 'Name'],keep = 'first')
CodePudding user response:
Let us use duplicated
df[~(df.duplicated('Id') & df['Name'].eq('x_1'))]
Id Name Value
0 1 x_1 23
1 1 x_2 24
3 1 x_3 27
4 1 x_4 28
5 1 x_3 29
6 1 x_4 30