I have daily sensor readings. It is necessary to set a counter of non-zero values by sensors and by the number of readings, i.e. from the first non-zero value to zero - 1; from the next non-zero value to zero -2; in this case, for a new sensor, the counter must start again from 1.I need periods of continuous sensor signal. When the first signal comes - 1; the second signal is 2; the third signal is 3. New sensor, the counter starts again from 1. This is necessary for the analysis of periods (groupby).
Example dataframe:
import pandas as pd
df = pd.DataFrame({
'Date': ['01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','06.02.2021','07.02.2021','08.02.2021','09.02.2021','10.02.2021','11.02.2021'],
'Sensors': [1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3],
'Value': [0,2,3,5,0,22,3,0,11,22,0,0,37,8,0,95] })
df
Date Sensors Value
0 01.01.2020 1 0
1 02.01.2020 1 2
2 03.01.2020 1 3
3 04.01.2020 1 5
4 05.01.2020 1 0
5 01.01.2020 1 22
6 02.01.2020 2 3
7 03.01.2020 2 0
8 04.01.2020 2 11
9 05.01.2020 2 22
10 06.02.2021 3 0
11 07.02.2021 3 0
12 08.02.2021 3 37
13 09.02.2021 3 8
14 10.02.2021 3 0
15 11.02.2021 3 95
results:
d = {
'Date':['01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','01.01.2020','02.01.2020','03.01.2020','04.01.2020','05.01.2020','06.02.2021','07.02.2021','08.02.2021','09.02.2021','10.02.2021','11.02.2021'],
'Sensors': [1,1,1,1,1,1,2,2,2,2,3,3,3,3,3,3],
'Value': [0,2,3,5,0,22,3,0,11,22,0,0,37,8,0,95],
'Count': [0,1,1,1,0,2,1,0,2,2,0,0,1,1,0,2]
}
df = pd.DataFrame(data=d)
df
Date Sensors Value Count
0 01.01.2020 1 0 0
1 02.01.2020 1 2 1
2 03.01.2020 1 3 1
3 04.01.2020 1 5 1
4 05.01.2020 1 0 0
5 01.01.2020 1 22 2
6 02.01.2020 2 3 1
7 03.01.2020 2 0 0
8 04.01.2020 2 11 2
9 05.01.2020 2 22 2
10 06.02.2021 3 0 0
11 07.02.2021 3 0 0
12 08.02.2021 3 37 1
13 09.02.2021 3 8 1
14 10.02.2021 3 0 0
15 11.02.2021 3 95 2
CodePudding user response:
So you want to groupby 'Sensors;', then have a cumulative counter of nonzero 'Value's. First I started with:
df.groupby('Sensors').apply(lambda row: row.assign(Count = (row['Value']==0).cumsum()*(row['Value']!=0) )).reset_index(drop=True)
Date Sensors Value Count
0 01.01.2020 1 0 0
1 02.01.2020 1 2 1
2 03.01.2020 1 3 1
3 04.01.2020 1 5 1
4 05.01.2020 1 0 0
5 01.01.2020 1 22 2
6 02.01.2020 2 3 0 <-- should start at 1 not 0
7 03.01.2020 2 0 0
8 04.01.2020 2 11 1
9 05.01.2020 2 22 1
10 06.02.2021 3 0 0
11 07.02.2021 3 0 0
12 08.02.2021 3 37 2 <-- should start at 1 not 2
13 09.02.2021 3 8 2
14 10.02.2021 3 0 0
15 11.02.2021 3 95 3
This was close, but as you can see slightly off, we need to number the zero'th rows in each group to start at 0 if they're zero, 1 if they're nonzero.
Then I did this aiming to fix it:
df.groupby('Sensors').apply(lambda row: row.assign(Count = (row['Value']!=0).diff() ))
but:
- the diff gives a leading Nan, which you need to
fillna()
from I think the.iloc[0]
value. Finally you can take.cumsum()
. Then do.reset_index()
to drop the groupby's multi-index. Anyway you get the idea. - Also,
(row['Value']!=0)
gives a boolean rather than an int. Doing.diff()
on that boolean only gives True/False(/NaN), not 1/0/-1(/NaN) which is more useful. So maybe(row['Value']!=0).astype(int).diff()
better.