I'm trying to delete id
that doesn't contain all 3 months in month.
For example, we have df
as:
id month
100 1
100 2
100 3
101 2
102 3
Then I would like to have the new df as just with the id
100 like this:
id month
100 1
100 2
100 3
So what I've done is
df.groupby(['id'].month.count() == 3
which gives me
id month
100 True
101 False
102 False
I'm currently stuck on how to continue.
CodePudding user response:
You can use groupby
transform('nunique')
and slice on the boolean output after comparison with 3
:
df[df.groupby('id')['month'].transform('nunique').eq(3)]
output:
id month
0 100 1
1 100 2
2 100 3
NB. if you are sure there are no duplicated months, transform('count')
will also work
CodePudding user response:
I think you are close, but you need to modify your code a bit. Use your code but swap count
with nunique
which will return a series
showing your ID's with True
or False
depending whether they have all the months. Then, you can filter:
t = (df.groupby(['id']).month.nunique() == 3)
print(df.loc[df.id.isin(t[t].index)])
id month
0 100 1
1 100 2
2 100 3