I have a df with IDs and Dates (. Here is an example:
ID Date
1 26.04.2011
1 21.10.2011
14 25.02.2010
14 08.07.2010
14 20.10.2010
14 07.01.2011
14 20.04.2011
14 02.07.2011
14 11.10.2011
14 23.01.2012
14 19.04.2012
14 22.10.2012
14 15.01.2013
14 06.05.2013
18 23.11.2012
18 05.06.2013
18 19.08.2013
18 11.04.2014
18 18.07.2014
ID object
Date datetime64[ns]
I want to keep only those rows where there are =< 3 Dates per year per ID. So the result should be:
ID Date
14 25.02.2010
14 08.07.2010
14 20.10.2010
14 07.01.2011
14 20.04.2011
14 02.07.2011
14 11.10.2011
14 23.01.2012
14 19.04.2012
14 22.10.2012
I tried groupby and size:
ID year size
0 1 2011 2
1 14 2010 3
2 14 2011 4
However this is not what I want.
CodePudding user response:
Use GroupBy.transform
per ID
and years with count by GroupBy.size
, compare for greater or equal by Series.ge
and filter in boolean indexing
:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df[df.groupby(['ID',df['Date'].dt.year])['ID'].transform('size').ge(3)]
print (df)
ID Date
2 14 2010-02-25
3 14 2010-07-08
4 14 2010-10-20
5 14 2011-01-07
6 14 2011-04-20
7 14 2011-07-02
8 14 2011-10-11
9 14 2012-01-23
10 14 2012-04-19
11 14 2012-10-22