How to calculate mean cycle wise in python-CodePudding

How can I calculate cycle-wise average (mean for values circled in red) only for true conditions (cycle value 1) for other column DCUL13.LC01? Is there any function or can someone help me with the code?

CodePudding user response：

You can aggregate mean for consecutive 1 create by cumulative sum and filtered with inverted mask by boolean indexing:

df = pd.DataFrame({'col' : range(15),'cycle' : [1,1,1,1,0,1,0,0,1,0,1,1,0,1,0]})
print (df)
    col  cycle
0     0      1
1     1      1
2     2      1
3     3      1
4     4      0
5     5      1
6     6      0
7     7      0
8     8      1
9     9      0
10   10      1
11   11      1
12   12      0
13   13      1
14   14      0

m = df['cycle'].eq(0)
df1 = df[~m].groupby(m.cumsum())['col'].mean().reset_index(name='Average')
print (df1)
   cycle  Average
0      0      1.5
1      1      5.0
2      3      8.0
3      4     10.5
4      5     13.0

Details:

print (df.assign(cumsum=m.cumsum(), inv_mask = ~m))
    col  cycle  cumsum  inv_mask
0     0      1       0      True
1     1      1       0      True
2     2      1       0      True
3     3      1       0      True
4     4      0       1     False
5     5      1       1      True
6     6      0       2     False
7     7      0       3     False
8     8      1       3      True
9     9      0       4     False
10   10      1       4      True
11   11      1       4      True
12   12      0       5     False
13   13      1       5      True
14   14      0       6     False

CodePudding user response：

I would rename your second column as it seems to not represent the cycle number but some other value (as shown by your annotations).

I would then create another column in your dataframe called cycle that contains the corresponding cycle number (e.g. in your case, 1 for the first 5 rows, 2 for the next 3 rows etc).

You can then use fancy indexing and the pandas groupby function to calculate the mean, as shown below (with renamed column names, only two cycles and slightly edited data so as to show the mean is being calculated correctly):

x = {'A' : [277, 277, 277, 277, 300, 277, 276, 276],
     'B' : [1, 1, 1, 1, 0, 1, 0, 0],
     'cycle' : [1, 1, 1, 1, 1, 2, 2, 2,]}

x = pd.DataFrame(x)

means_of_A = x[x['B']==1].groupby('cycle').mean()['A']

means_of_A is then equal to:

cycle
1    277.0
2    277.0
Name: A, dtype: float64

which is equivalent to what you need.