Given is the following dataframe:
name item price timestamp
______________________________________
Elliot a 10 12312
Sara b 20 45654
Elliot a 30 15000
Tom a 5 43423
Elliot, for example, consumed item 'a' twice, and now I want to aggregate over all such cases such that in the end I have only a single occurrence of 'name-item' pair, but such that the 'price' is summed and the 'timestamp' corresponds to the minimum found:
name item price timestamp
______________________________________
Elliot a 40 12312
Sara b 20 45654
Tom a 5 43423
How can this be done efficiently?
CodePudding user response:
You can use groupby.agg
which takes a dictionary with your column names as keys
and operations as values
.
calcs = {'item':'first','price':'sum','timestamp':'min'}
df.groupby('name').agg(calcs).reset_index()
which prints:
name item price timestamp
0 Elliot a 40 12312
1 Sara b 20 45654
2 Tom a 5 43423
You can also use Named.Agg
where you control the output names with different aggregations per column, as below:
>>> df.groupby('name').agg(
total_price_per_customer=pd.NamedAgg(column='price', aggfunc='sum'),
minimum_timestamp_per_customer=pd.NamedAgg(column='timestamp', aggfunc='min'))
total_price_per_customer minimum_timestamp_per_customer
name
Elliot 40 12312
Sara 20 45654
Tom 5 43423