I have dataframe that shows how many times ID append during every hour of day (from 0 to 23 hour).
Let's say it looks like this:
ID hour
0 1 0
1 1 0
2 1 0
3 1 0
4 1 0
... ... ...
10115 456 23
10116 456 23
10117 456 23
10118 456 23
10119 456 23
10120 rows × 2 columns
I want to find IDs that append less than 12 times during a day (so append less than 12 hours during a day)
I have tried:
df2.groupby("ID").hour.unique()
and output:
ID
1 [15, 20]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5 [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
...
452 [15]
453 [15]
454 [16]
455 [16]
456 [16]
Name: hour, Length: 2853, dtype: object
If I am thinking correctly, it shows me how many times (and in which hour) ID appear as a list. How can I referer to all this lists that lenght is smaller than 12?
CodePudding user response:
Use nunique
instead of unique
:
>>> df2.groupby("ID")['hour'].nunique().loc[lambda x: x < 12].index
Int64Index([1, 452, 453, 454, 455, 456], dtype='int64', name='ID')