counter in a column for non-zero data, taking into account values in another column-CodePudding

I have daily sensor readings. It is necessary to set a counter of non-zero values by sensors and by the number of readings, i.e. from the first non-zero value to zero - 1; from the next non-zero value to zero -2; in this case, for a new sensor, the counter must start again from 1.I need periods of continuous sensor signal. When the first signal comes - 1; the second signal is 2; the third signal is 3. New sensor, the counter starts again from 1. This is necessary for the analysis of periods (groupby).

Example dataframe:

import pandas as pd

df = pd.DataFrame({
  'Date':    ['01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','06.02.2021','07.02.2021','08.02.2021','09.02.2021','10.02.2021','11.02.2021'],
  'Sensors': [1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3],
  'Value':   [0,2,3,5,0,22,3,0,11,22,0,0,37,8,0,95] })

df
          Date  Sensors  Value
0   01.01.2020        1      0
1   02.01.2020        1      2
2   03.01.2020        1      3
3   04.01.2020        1      5
4   05.01.2020        1      0
5   01.01.2020        1     22
6   02.01.2020        2      3
7   03.01.2020        2      0
8   04.01.2020        2     11
9   05.01.2020        2     22
10  06.02.2021        3      0
11  07.02.2021        3      0
12  08.02.2021        3     37
13  09.02.2021        3      8
14  10.02.2021        3      0
15  11.02.2021        3     95

results:

d = {
      'Date':['01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','06.02.2021','07.02.2021','08.02.2021','09.02.2021','10.02.2021','11.02.2021'],
      'Sensors': [1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3],
      'Value':   [0,2,3,5,0,22,3,0,11,22,0,0,37,8,0,95],
      'Count':   [0,1,1,1,0,2,1,0,2,2,0,0,1,1,0,2]

}
df = pd.DataFrame(data=d)
df
          Date  Sensors   Value  Count
0   01.01.2020         1      0      0
1   02.01.2020         1      2      1
2   03.01.2020         1      3      1
3   04.01.2020         1      5      1
4   05.01.2020         1      0      0
5   01.01.2020         1     22      2
6   02.01.2020         2      3      1
7   03.01.2020         2      0      0
8   04.01.2020         2     11      2
9   05.01.2020         2     22      2
10  06.02.2021         3      0      0
11  07.02.2021         3      0      0
12  08.02.2021         3     37      1
13  09.02.2021         3      8      1
14  10.02.2021         3      0      0
15  11.02.2021         3     95      2

CodePudding user response：

So you want to groupby 'Sensors;', then have a cumulative counter of nonzero 'Value's. First I started with:

df.groupby('Sensors').apply(lambda row: row.assign(Count = (row['Value']==0).cumsum()*(row['Value']!=0) )).reset_index(drop=True)

          Date  Sensors  Value  Count
0   01.01.2020        1      0      0
1   02.01.2020        1      2      1
2   03.01.2020        1      3      1
3   04.01.2020        1      5      1
4   05.01.2020        1      0      0
5   01.01.2020        1     22      2
6   02.01.2020        2      3      0  <-- should start at 1 not 0
7   03.01.2020        2      0      0
8   04.01.2020        2     11      1
9   05.01.2020        2     22      1
10  06.02.2021        3      0      0
11  07.02.2021        3      0      0
12  08.02.2021        3     37      2  <-- should start at 1 not 2
13  09.02.2021        3      8      2
14  10.02.2021        3      0      0
15  11.02.2021        3     95      3

This was close, but as you can see slightly off, we need to number the zero'th rows in each group to start at 0 if they're zero, 1 if they're nonzero.

Then I did this aiming to fix it:

df.groupby('Sensors').apply(lambda row: row.assign(Count = (row['Value']!=0).diff() ))

but:

the diff gives a leading Nan, which you need to fillna() from I think the .iloc[0] value. Finally you can take .cumsum(). Then do .reset_index() to drop the groupby's multi-index. Anyway you get the idea.
Also, (row['Value']!=0) gives a boolean rather than an int. Doing .diff() on that boolean only gives True/False(/NaN), not 1/0/-1(/NaN) which is more useful. So maybe (row['Value']!=0).astype(int).diff() better.