Home > Net >  Pandas groupby and filtering groups with rows greater than N rows
Pandas groupby and filtering groups with rows greater than N rows

Time:03-05

I have a pandas df of the following format

STOCK    YR    MONTH  DAY     PRICE
AAA     2022    1     1        10
AAA     2022    1     2        11
AAA     2022    1     3        10
AAA     2022    1     4        15
AAA     2022    1     5        10
BBB     2022    1     1        5
BBB     2022    1     2        10
BBB     2022    2     1        10
BBB     2022    2     2        15

What I am looking to do is to filter this df such that I am grouping by STOCK and YR and MONTH and selecting the groups with 3 or more entries. So the resulting df looks like

STOCK    YR    MONTH  DAY     PRICE
AAA     2022    1     1        10
AAA     2022    1     2        11
AAA     2022    1     3        10
AAA     2022    1     4        15
AAA     2022    1     5        10

Note that BBB is eliminated as it had only 2 rows in each group, when grouped by STOCK, YR and MONTH

I have tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x.STOCK.nunique() > 5) but this resulted in an empty frame.

Also tried df.groupby(['STOCK','YR','MONTH']).filter(lambda x: x['STOCK','YR','MONTH'].nunique() > 5) but this resulted in a KeyError: ('STOCK', 'YR', 'MONTH')

Thanks!

CodePudding user response:

Use GroupBy.transform('count'):

df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('count').ge(3)]

or 'size':

df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').ge(3)]

output:

  STOCK    YR  MONTH  DAY  PRICE
0   AAA  2022      1    1     10
1   AAA  2022      1    2     11
2   AAA  2022      1    3     10
3   AAA  2022      1    4     15
4   AAA  2022      1    5     10

CodePudding user response:

Use GroupBy.transform:

If need counts (not exclude possible NaNs):

#if need test number of unique values
df[df.groupby(['STOCK', 'YR', 'MONTH'])['STOCK'].transform('size').gt(3)]

Or:

#in large df should be slow
df.groupby(['STOCK','YR','MONTH']).filter(lambda x: len(x) > 3)
  • Related