Creating a Python function that will calculate an average value or sum values based on n previous va-CodePudding

I have a dataframe of biometric data. I need to reduce it down to the rows that only contain a value for 'CGM'. While I know how to remove all nulls in a column I want to make sure the values from the other columns are averaged / summed before doing so.

I was trying to create an if statement but I'm relatively new to python and can't seem to find the code that works.

What I'd like the code to do would be to first check if CGM has a value and then move to HR column and get an average of that value and the previous n values (first instance this would be 4, second instance would be 15, frequency changes throughout dataframe). For steps, distance (these are populated throughout dataframe), calories, I'd like to sum the previous n values.

Any suggestions or code would be greatly appreciated. If there is something I haven't made clear please do let me know.

This would be the output I'd like to produce:

index	Time	CGM	HR	calories
0	2018-05-03 00:03:00	5	70.0	6.94
1	2018-05-03 00:18:00	4.8	72.533	17.8372
2	2018-05-03 00:33:00	4.8	72.4	17.8372

Sample Data:

index	Time	CGM	HR	calories
0	2018-05-03 00:00:00	NaN	70.0	1.1735
1	2018-05-03 00:01:00	NaN	71.0	1.1735
2	2018-05-03 00:02:00	NaN	69.0	1.1735
3	2018-05-03 00:03:00	5.0	70.0	1.1735
4	2018-05-03 00:04:00	NaN	71.0	1.1735
5	2018-05-03 00:05:00	NaN	72.0	1.29085
6	2018-05-03 00:06:00	NaN	71.0	1.1735
7	2018-05-03 00:07:00	NaN	72.0	1.1735
8	2018-05-03 00:08:00	NaN	73.0	1.1735
9	2018-05-03 00:09:00	NaN	71.0	1.1735
10	2018-05-03 00:10:00	NaN	73.0	1.1735
11	2018-05-03 00:11:00	NaN	73.0	1.29085
12	2018-05-03 00:12:00	NaN	71.0	1.1735
13	2018-05-03 00:13:00	NaN	73.0	1.1735
14	2018-05-03 00:14:00	NaN	78.0	1.1735
15	2018-05-03 00:15:00	NaN	77.0	1.1735
16	2018-05-03 00:16:00	NaN	71.0	1.1735
17	2018-05-03 00:17:00	NaN	72.0	1.1735
18	2018-05-03 00:18:00	4.8	70.0	1.1735
19	2018-05-03 00:19:00	NaN	74.0	1.1735
20	2018-05-03 00:20:00	NaN	77.0	1.29085
21	2018-05-03 00:21:00	NaN	72.0	1.1735
22	2018-05-03 00:22:00	NaN	75.0	1.29085
23	2018-05-03 00:23:00	NaN	73.0	1.1735
24	2018-05-03 00:24:00	NaN	72.0	1.1735
25	2018-05-03 00:25:00	NaN	75.0	1.1735
26	2018-05-03 00:26:00	NaN	71.0	1.1735
27	2018-05-03 00:27:00	NaN	69.0	1.1735
28	2018-05-03 00:28:00	NaN	70.0	1.1735
29	2018-05-03 00:29:00	NaN	68.0	1.1735
30	2018-05-03 00:30:00	NaN	72.0	1.1735
31	2018-05-03 00:31:00	NaN	75.0	1.1735
32	2018-05-03 00:32:00	NaN	69.0	1.1735
33	2018-05-03 00:33:00	4.8	74.0	1.1735

CodePudding user response：

Ok this may not be the best solution but it gives you what you want:

Starting from this df

df

    CGM HR      steps   distance    calories    Time
0   NaN 70.0    0.0     0.0         1.17350     2018-05-03 00:00:00
1   NaN 71.0    0.0     0.0         1.17350     2018-05-03 00:01:00
2   NaN 69.0    0.0     0.0         1.17350     2018-05-03 00:02:00
3   5.0 70.0    0.0     0.0         1.17350     2018-05-03 00:03:00
4   NaN 71.0    0.0     0.0         1.17350     2018-05-03 00:04:00
5   NaN 72.0    0.0     0.0         1.29085     2018-05-03 00:05:00
6   7.0 70.0    0.0     0.0         1.17350     2018-05-03 00:06:00

We will divide the dataframe based on non-NaN values and then aggregate the columns. Then we will concatenate the single dataframes to have the final one:

import pandas as pd
import numpy as np

# Splitting by non-nan values
events = np.split(df, np.where(~np.isnan(df.CGM))[0])

# removing empty DataFrames
events = [ev for ev in events if not ev.empty]

# Saving first timestamp because if Nan should be readded
first_nan = events[0][:1]
events = [ev.append(ev2[:1])[1:] for ev, ev2 in zip(events[:-1], events[1:])]

if first_nan['CGM'].isna().item():
    events[0] = first_nan.append(events[0])

# Preparing new dataframe
df_2 = pd.DataFrame()

# Concatenate aggregation of dataframe
for e in events:
  new_df = pd.DataFrame(e.agg({"Time": "max", "CGM":"sum", "HR": "mean", "steps": "sum", "distance": "sum", "calories":"sum"})).transpose()
  df_2 = pd.concat([df_2, new_df])

Output:

df_2

Time                    CGM   HR    steps   distance    calories

0   2018-05-03 00:03:00 5.0   70.0  0.0     0.0         4.694
0   2018-05-03 00:06:00 7.0   71.0  0.0     0.0         3.63785