I have a dataframe that looks something like this:
CCode CId State Age
326112 1217012 Nebraska 20
326112 1217012 Nebraska 75
326112 1217012 Nebraska 107
320113 1317012 Alabama 37
320113 1317012 Alabama 38
320113 1317012 Alabama 39
320180 8017061 Arkansas 37
320180 8017061 Arkansas 38
320180 8017061 Arkansas 39
326112 1217011 California 36
326112 1217011 California 37
326112 1217011 California 38
I am trying to keep groups that have at least one age below 24, and re-group the filtered groups.
I tried grouping by 4 columns and filtering with a chained lambda:
df.groupby(['State', 'CId', 'CCode', 'Age']).filter(lambda x:x['Age'].sum()<23).groupby(['State', 'CId', 'CCode', 'Age']).std()
This loses the rest of the group:
CCode CId State Age
326112 1217012 Nebraska 20
Needed output:
CCode CId State Age
326112 1217012 Nebraska 20
326112 1217012 Nebraska 75
326112 1217012 Nebraska 107
CodePudding user response:
Based on clarification from the comments:
what I would like to do is keep the groups that have an age starting below 24
df.groupby(['State', 'CId', 'CCode']).filter(lambda g: g.Age.lt(24).any())
# CCode CId State Age
# 0 326112 1217012 Nebraska 20
# 1 326112 1217012 Nebraska 75
# 2 326112 1217012 Nebraska 107
CodePudding user response:
If you have to strictly filter based on the starting age only, one way would be to use iloc
:
df.groupby(["State", "CId", "CCode"]).filter(lambda x: x["Age"].iloc[0] < 24)
Output:
CCode CId State Age
0 326112 1217012 Nebraska 20
1 326112 1217012 Nebraska 75
2 326112 1217012 Nebraska 107