Home > Enterprise >  How to drop duplicates in each group in dataframe?
How to drop duplicates in each group in dataframe?

Time:06-15

I have a dataset:

id1   id2     value
a1    b1     "main"
a1    b1     "main"
a1    b1     "secondary"
a2    b2     "main"
a2    b2     "repair"
a2    b2     "uploaded"
a2    b2     "main"

I want to drop duplicate values in column "value" in each id1 id2 group. So desired result is:

id1   id2     value
a1    b1     "main"
a1    b1     "secondary"
a2    b2     "main"
a2    b2     "repair"
a2    b2     "uploaded"

How could I do that? I know method drop_duplicates but how to use it with groupby?

CodePudding user response:

Try:

x = (
    df.groupby(["id1", "id2"])
    .apply(lambda x: x.drop_duplicates("value"))
    .reset_index(drop=True)
)
print(x)

Prints:

  id1 id2        value
0  a1  b1       "main"
1  a1  b1  "secondary"
2  a2  b2       "main"
3  a2  b2     "repair"
4  a2  b2   "uploaded"
  • Related