Home > Enterprise >  Condense dataset pandas
Condense dataset pandas

Time:11-23

I wish to condense my dataset. Essentially it is a groupby.

Data

id  box     status
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box11   hey
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
aa  box5    hello
bb  box8    no
bb  box8    no

Desired

id  box     status
aa  box11   hey
aa  box5    hello
bb  box8    no

Doing

df1 = df.groupby(["id"])["box"]).agg()

CodePudding user response:

DataFrame.drop_duplicates()

If you want to be careful and exclude "id" you can use the subset keyword:

df1 = df.drop_duplicates(subset = ['box', 'status'])
  • Related