Home > Software engineering >  how to delete columns with a certain count condition
how to delete columns with a certain count condition

Time:12-22

I'm trying to delete id that doesn't contain all 3 months in month. For example, we have df as:

id       month   
100        1
100        2
100        3
101        2
102        3

Then I would like to have the new df as just with the id 100 like this:

id       month   
100        1
100        2
100        3

So what I've done is

df.groupby(['id'].month.count() == 3

which gives me

id        month
100        True
101        False
102        False

I'm currently stuck on how to continue.

CodePudding user response:

You can use groupby transform('nunique') and slice on the boolean output after comparison with 3:

df[df.groupby('id')['month'].transform('nunique').eq(3)]

output:

    id  month
0  100      1
1  100      2
2  100      3

NB. if you are sure there are no duplicated months, transform('count') will also work

CodePudding user response:

I think you are close, but you need to modify your code a bit. Use your code but swap count with nunique which will return a series showing your ID's with True or False depending whether they have all the months. Then, you can filter:

t = (df.groupby(['id']).month.nunique() == 3)
print(df.loc[df.id.isin(t[t].index)])

    id  month
0  100      1
1  100      2
2  100      3
  • Related