Home > Software design >  Aggregating over two columns (sum and min) [duplicate]
Aggregating over two columns (sum and min) [duplicate]

Time:09-22

Given is the following dataframe:

name    item        price   timestamp
______________________________________
Elliot   a            10        12312
Sara     b            20        45654
Elliot   a            30        15000
Tom      a            5         43423

Elliot, for example, consumed item 'a' twice, and now I want to aggregate over all such cases such that in the end I have only a single occurrence of 'name-item' pair, but such that the 'price' is summed and the 'timestamp' corresponds to the minimum found:

name    item        price   timestamp
______________________________________
Elliot   a            40        12312
Sara     b            20        45654
Tom      a            5         43423

How can this be done efficiently?

CodePudding user response:

You can use groupby.agg which takes a dictionary with your column names as keys and operations as values.

calcs = {'item':'first','price':'sum','timestamp':'min'}
df.groupby('name').agg(calcs).reset_index()

which prints:

     name item  price  timestamp
0  Elliot    a     40      12312
1    Sara    b     20      45654
2     Tom    a      5      43423

You can also use Named.Agg where you control the output names with different aggregations per column, as below:

>>> df.groupby('name').agg( 
    total_price_per_customer=pd.NamedAgg(column='price', aggfunc='sum'), 
    minimum_timestamp_per_customer=pd.NamedAgg(column='timestamp', aggfunc='min'))

        total_price_per_customer  minimum_timestamp_per_customer
name                                                            
Elliot                        40                           12312
Sara                          20                           45654
Tom                            5                           43423
  • Related