I have a pandas dataframe like below.
id A B C
0 1 1 1 1
1 1 5 7 2
2 2 6 9 3
3 3 1 5 4
4 3 4 6 2
After evaluating conditions,
id A B C a_greater_than_b b_greater_than_c c_greater_than_a
0 1 1 1 1 False False False
1 1 5 7 2 False True False
2 2 6 9 3 False True False
3 3 1 5 4 False True True
4 3 4 6 2 False True False
And after evaluating conditions, want to aggregate the results per id.
id a_greater_than_b b_greater_than_c c_greater_than_a
1 False False False
2 False True False
3 False True False
CodePudding user response:
Group by id
and aggregate the condition columns using all
agg_cols = ['a_greater_than_b', 'b_greater_than_c', 'c_greater_than_a']
res = df.groupby('id')[agg_cols].all()
Output:
>>> df
id A B C a_greater_than_b b_greater_than_c c_greater_than_a
0 1 1 1 1 False False False
1 1 5 7 2 False True False
2 2 6 9 3 False True False
3 3 1 5 4 False True True
4 3 4 6 2 False True False
>>> res
a_greater_than_b b_greater_than_c c_greater_than_a
id
1 False False False
2 False True False
3 False True False
CodePudding user response:
use group by on id and any as the aggregate function
txt="""
1 1 1 1
1 5 7 2
2 6 9 3
3 1 5 4
3 4 6 2
"""
df = pd.DataFrame(columns=['id', 'A', 'B', 'C'])
for line in txt.split('\n'):
if line.strip():
df.loc[len(df)] = line.split()
print(df)
df['a_greater_than_b']=df['A']>df['B']
df['b_greater_than_c']=df['B']>df['C']
df['c_greater_than_a']=df['C']>df['A']
grouped=df.groupby('id').agg({'a_greater_than_b': 'any', 'b_greater_than_c': 'any', 'c_greater_than_a': 'any'})
print(grouped)
output:
a_greater_than_b b_greater_than_c c_greater_than_a
id
1 False True False
2 False True False
3 False True True