Home > Back-end >  Python pandas aggregate groups
Python pandas aggregate groups

Time:06-14

I have a pandas dataframe like below.

   id  A  B  C
0   1  1  1  1
1   1  5  7  2
2   2  6  9  3
3   3  1  5  4
4   3  4  6  2

After evaluating conditions,

   id  A  B  C  a_greater_than_b  b_greater_than_c  c_greater_than_a
0   1  1  1  1             False             False             False
1   1  5  7  2             False              True             False
2   2  6  9  3             False              True             False
3   3  1  5  4             False              True              True
4   3  4  6  2             False              True             False

And after evaluating conditions, want to aggregate the results per id.

id  a_greater_than_b  b_greater_than_c  c_greater_than_a
1   False             False             False
2   False             True              False
3   False             True              False

CodePudding user response:

Group by id and aggregate the condition columns using all

agg_cols = ['a_greater_than_b', 'b_greater_than_c',  'c_greater_than_a']
res = df.groupby('id')[agg_cols].all()

Output:

>>> df

   id  A  B  C  a_greater_than_b  b_greater_than_c  c_greater_than_a
0   1  1  1  1             False             False             False
1   1  5  7  2             False              True             False
2   2  6  9  3             False              True             False
3   3  1  5  4             False              True              True
4   3  4  6  2             False              True             False

>>> res

    a_greater_than_b  b_greater_than_c  c_greater_than_a
id                                                      
1              False             False             False
2              False              True             False
3              False              True             False

CodePudding user response:

use group by on id and any as the aggregate function

txt="""
1  1  1  1
1  5  7  2
2  6  9  3
3  1  5  4
3  4  6  2
"""

df = pd.DataFrame(columns=['id', 'A', 'B', 'C'])
for line in txt.split('\n'):
    if line.strip():
        df.loc[len(df)] = line.split()
            
print(df)

df['a_greater_than_b']=df['A']>df['B']
df['b_greater_than_c']=df['B']>df['C']
df['c_greater_than_a']=df['C']>df['A']
grouped=df.groupby('id').agg({'a_greater_than_b': 'any', 'b_greater_than_c': 'any', 'c_greater_than_a': 'any'})
print(grouped)

output:

 a_greater_than_b  b_greater_than_c  c_greater_than_a
id                                                      
1              False              True             False
2              False              True             False
3              False              True              True
  • Related