How to apply scipy.integrate.cumulative_trapezoid to grouped Pandas dataframe with transform?-CodePudding

I have a dataframe consisting of timeseries signals, something like:

id | x | y
-------------------
A  | 1 | 3.4
B  | 1 | 2.1
C  | 1 | 1.0
A  | 2 | 2.0
B  | 2 | 0.8
A  | 5 | 5.2
C  | 5 | 1.2

I want to to get the cumulative trapezoidal sums from integrating the time series. Something like:

id | x | y    | cumtrap
-------------------
A  | 1 | 3.4  | 0.0
B  | 1 | 2.1  | 0.0
C  | 1 | 1.0  | 0.0
A  | 2 | 2.0  | 2.7
B  | 2 | 0.8  | 1.45
A  | 5 | 5.2  | 13.5
C  | 5 | 1.2  | 4.4

I can apply the cumulative_trapezoid function after the groupby.

from scipy.integrate import cumulative_trapezoid
import pandas as pd

df = pd.DataFrame(
    {
        'id': ['A', 'B', 'C', 'A', 'B', 'A', 'C'],
        'x': [1, 1, 1, 2, 2, 5, 5],
        'y': [3.4, 2.1, 1.0, 2.0, 0.8, 5.2, 1.2]
    }
)

df.groupby('id').apply(lambda df: cumulative_trapezoid(df["y"], df["x"], initial=0))

This gets me:

id
A             [0.0, 2.7, 13.5]
B    [0.0, 1.4500000000000002]
C                   [0.0, 4.4]
dtype: object

But I'd also like to insert the results back into the dataframe. Switching the apply to a transform is effectively what I want, but transform only applies to one column at a time and I can't use keys to refer to the columns.

# Doesn't work (key error)
df.groupby('id').transform(lambda df: cumulative_trapezoid(df["y"], df["x"], initial=0))

What's the best way to insert these arrays for each group back into the dataframe?

CodePudding user response：

Let's try

out = df.groupby('id').apply(lambda g: g.assign(cumtrap=cumulative_trapezoid(g["y"], g["x"], initial=0)))

print(out)

  id  x    y  cumtrap
0  A  1  3.4     0.00
1  B  1  2.1     0.00
2  C  1  1.0     0.00
3  A  2  2.0     2.70
4  B  2  0.8     1.45
5  A  5  5.2    13.50
6  C  5  1.2     4.40