Home > OS >  How to keep groups with at least one value less than 24 using pandas groupby.filter
How to keep groups with at least one value less than 24 using pandas groupby.filter

Time:12-06

I have a dataframe that looks something like this:

CCode   CId      State       Age
326112  1217012  Nebraska    20
326112  1217012  Nebraska    75
326112  1217012  Nebraska    107

320113  1317012  Alabama     37
320113  1317012  Alabama     38
320113  1317012  Alabama     39

320180  8017061  Arkansas    37
320180  8017061  Arkansas    38
320180  8017061  Arkansas    39

326112  1217011  California  36
326112  1217011  California  37
326112  1217011  California  38

I am trying to keep groups that have at least one age below 24, and re-group the filtered groups.

I tried grouping by 4 columns and filtering with a chained lambda:

df.groupby(['State', 'CId', 'CCode', 'Age']).filter(lambda x:x['Age'].sum()<23).groupby(['State', 'CId', 'CCode', 'Age']).std()

This loses the rest of the group:

CCode   CId      State       Age
326112  1217012  Nebraska    20

Needed output:

CCode   CId      State       Age
326112  1217012  Nebraska    20
326112  1217012  Nebraska    75
326112  1217012  Nebraska    107

CodePudding user response:

Based on clarification from the comments:

what I would like to do is keep the groups that have an age starting below 24

  1. Remove Age from the grouper
  2. Test if any Age in the group is lt (less than) 24
df.groupby(['State', 'CId', 'CCode']).filter(lambda g: g.Age.lt(24).any())

#     CCode      CId     State  Age
# 0  326112  1217012  Nebraska   20
# 1  326112  1217012  Nebraska   75
# 2  326112  1217012  Nebraska  107

CodePudding user response:

If you have to strictly filter based on the starting age only, one way would be to use iloc:

df.groupby(["State", "CId", "CCode"]).filter(lambda x: x["Age"].iloc[0] < 24)

Output:

    CCode      CId     State  Age
0  326112  1217012  Nebraska   20
1  326112  1217012  Nebraska   75
2  326112  1217012  Nebraska  107
  • Related