Home > Net >  Dynamic aggregation in Pandas DataFrame
Dynamic aggregation in Pandas DataFrame

Time:05-11

I have timestamped data in a dataframe of the form:

 ---- ------- ------- 
| ID | DATE  | VALUE |
 ---- ------- ------- 
|  1 | 01-01 |    10 |
|  1 | 01-01 |    20 |
|  1 | 02-01 |    20 |
|  1 | 02-01 |    25 |
|  1 | 03-01 |    30 |
|  2 | 01-01 |    10 |
|  2 | 02-01 |    20 |
|  2 | 02-01 |    30 |
|  2 | 03-01 |    30 |
 ---- ------- ------- 

Within the grouping by ID, I would now like to compute the difference of the value of each row to the average values of all rows that have equal or smaller dates, such that:

 ---- ------- ------- -------------------------------- 
| ID | DATE  | VALUE |          OUTPUT                |
 ---- ------- ------- -------------------------------- 
|  1 | 01-01 |    10 | 10-AVERAGE(10, 20)             |
|  1 | 01-01 |    20 | 20-AVERAGE(10, 20)             |
|  1 | 02-01 |    20 | 20-AVERAGE(10, 20, 20, 25)     |
|  1 | 02-01 |    25 | 25-AVERAGE(10, 20, 20, 25)     |
|  1 | 03-01 |    30 | 30-AVERAGE(10, 20, 20, 25, 30) |
|  2 | 01-01 |    10 | 10-AVERAGE(10)                 |
|  2 | 02-01 |    20 | 20-AVERAGE(10, 20, 30)         |
|  2 | 02-01 |    30 | 30-AVERAGE(10, 20, 30)         |
|  2 | 03-01 |    30 | 30-AVERAGE(10, 20, 30, 30)     |
 ---- ------- ------- -------------------------------- 

How can I achieve this with Pandas/NumPy withou creating a host of temporary columns?

I tried to create boolean masking columns and create temporary columns for every DATE that are only partially filled, but this is rather cumbersome.

CodePudding user response:

You could write a custom function to compute the averages and then call it with groupby.apply:

def expanding_average(frame):
    average = frame.groupby("DATE")["VALUE"].sum().expanding(1).sum().div(frame.groupby("DATE")["VALUE"].count().expanding(1).sum())
    return frame["DATE"].map(average)

df["Average"] = df["VALUE"].sub(df.groupby("ID").apply(expanding_average).droplevel(0))

>>> df
   ID   DATE  VALUE  Average
0   1  01-01     10    -5.00
1   1  01-01     20     5.00
2   1  02-01     20     1.25
3   1  02-01     25     6.25
4   1  03-01     30     9.00
5   2  01-01     10     0.00
6   2  02-01     20     0.00
7   2  02-01     30    10.00
8   2  03-01     30     7.50
  • Related