I'm trying to calculate the cumulative AUC of a dataframe values from first row to the current row.
Ex:
points | AUC | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 0.5 |
2 | 2 | 1 |
3 | 3 | 4.5 |
4 | 4 | 8 |
5 | 5 | 12.5 |
6 | 4 | 17 |
7 | 0 | 19 |
8 | -2 | 18 |
9 | -2 | 16 |
I can use np.trapz() but I have to calculate it row by row, by a for loop.
for i in df.index:
row={"AUC" : trapz(df["points"].iloc[:i])}
df["AUC"].iloc[i]=row
Is there any way to apply it to the whole column without using a for loop?
The second problem is that my dataframe gets updated every minutes so either I have to calculate this cumulative AUC from the beginning of the df which makes the calculation longer and longer, or choose a part of the df (ex: df.tail(25)) and apply a function to it, and by doing this I would lose calculate AUC of the curve before iloc[-25].
CodePudding user response:
I would try something like this:
np.cumsum(df.points)-np.concatenate(([0], np.cumsum(np.diff(df.points)/2)), axis=0)
here is a working example: https://abstra.show/dezL0ASX4s