I have a dataframe consisting of timeseries signals, something like:
id | x | y
-------------------
A | 1 | 3.4
B | 1 | 2.1
C | 1 | 1.0
A | 2 | 2.0
B | 2 | 0.8
A | 5 | 5.2
C | 5 | 1.2
I want to to get the cumulative trapezoidal sums from integrating the time series. Something like:
id | x | y | cumtrap
-------------------
A | 1 | 3.4 | 0.0
B | 1 | 2.1 | 0.0
C | 1 | 1.0 | 0.0
A | 2 | 2.0 | 2.7
B | 2 | 0.8 | 1.45
A | 5 | 5.2 | 13.5
C | 5 | 1.2 | 4.4
I can apply
the cumulative_trapezoid
function after the groupby.
from scipy.integrate import cumulative_trapezoid
import pandas as pd
df = pd.DataFrame(
{
'id': ['A', 'B', 'C', 'A', 'B', 'A', 'C'],
'x': [1, 1, 1, 2, 2, 5, 5],
'y': [3.4, 2.1, 1.0, 2.0, 0.8, 5.2, 1.2]
}
)
df.groupby('id').apply(lambda df: cumulative_trapezoid(df["y"], df["x"], initial=0))
This gets me:
id
A [0.0, 2.7, 13.5]
B [0.0, 1.4500000000000002]
C [0.0, 4.4]
dtype: object
But I'd also like to insert the results back into the dataframe. Switching the apply
to a transform
is effectively what I want, but transform only applies to one column at a time and I can't use keys to refer to the columns.
# Doesn't work (key error)
df.groupby('id').transform(lambda df: cumulative_trapezoid(df["y"], df["x"], initial=0))
What's the best way to insert these arrays for each group back into the dataframe?
CodePudding user response:
Let's try
out = df.groupby('id').apply(lambda g: g.assign(cumtrap=cumulative_trapezoid(g["y"], g["x"], initial=0)))
print(out)
id x y cumtrap
0 A 1 3.4 0.00
1 B 1 2.1 0.00
2 C 1 1.0 0.00
3 A 2 2.0 2.70
4 B 2 0.8 1.45
5 A 5 5.2 13.50
6 C 5 1.2 4.40