Home > Enterprise >  How to drop duplicates in each group in a dataframe?
How to drop duplicates in each group in a dataframe?

Time:06-16

I have the following dataset:

id1   id2     value
a1    b1     "main"
a1    b1     "main"
a1    b1     "secondary"
a2    b2     "main"
a2    b2     "repair"
a2    b2     "uploaded"
a2    b2     "main"

I want to drop duplicate values in the column called value in each id1 and id2 group. So the desired result is:

id1   id2     value
a1    b1     "main"
a1    b1     "secondary"
a2    b2     "main"
a2    b2     "repair"
a2    b2     "uploaded"

How could I do that? I know the method drop_duplicates, but how can I use it with groupby?

CodePudding user response:

Try:

x = (
    df.groupby(["id1", "id2"])
    .apply(lambda x: x.drop_duplicates("value"))
    .reset_index(drop=True)
)
print(x)

Prints:

  id1 id2        value
0  a1  b1       "main"
1  a1  b1  "secondary"
2  a2  b2       "main"
3  a2  b2     "repair"
4  a2  b2   "uploaded"
  • Related