Let's say that I have a dataframe as follow:
df = pd.DataFrame({'A':[1,1,1,1,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,0,1,1]})
Then, I convert it into a boolean form:
df.eq(1)
Out[213]:
A
0 True
1 True
2 True
3 True
4 True
5 False
6 False
7 True
8 True
9 False
10 True
11 True
12 True
13 True
14 True
15 False
16 False
17 False
18 False
19 False
20 True
21 True
What I want is to count consecutive sets of True values in the column. In this example, the output would be:
df
Out[215]:
A count
0 1 5.0
1 1 2.0
2 1 5.0
3 1 2.0
4 1 NaN
5 0 NaN
6 0 NaN
7 1 NaN
8 1 NaN
9 0 NaN
10 1 NaN
11 1 NaN
12 1 NaN
13 1 NaN
14 1 NaN
15 0 NaN
16 0 NaN
17 0 NaN
18 0 NaN
19 0 NaN
20 1 NaN
21 1 NaN
My progress has been by using tools as 'groupby' and 'cumsum' but honestly, I can not figure out how to solve it. Thanks in advance
CodePudding user response:
You can use df['A'].diff().ne(0).cumsum()
to generate a grouper that will group each consecutive group of zeros/ones:
# A side-by-side comparison:
>>> pd.concat([df['A'], df['A'].diff().ne(0).cumsum()], axis=1)
A A
0 1 1
1 1 1
2 1 1
3 1 1
4 1 1
5 0 2
6 0 2
7 1 3
8 1 3
9 0 4
10 1 5
11 1 5
12 1 5
13 1 5
14 1 5
15 0 6
16 0 6
17 0 6
18 0 6
19 0 6
20 1 7
21 1 7
Thus, group by that grouper, calculate sums, replace zero with NaN dropna
, and reset the index:
df['count'] = df.groupby(df['A'].diff().ne(0).cumsum()).sum().replace(0, np.nan).dropna().reset_index(drop=True)
Output:
>>> df
A B
0 1 5.0
1 1 2.0
2 1 5.0
3 1 2.0
4 1 NaN
5 0 NaN
6 0 NaN
7 1 NaN
8 1 NaN
9 0 NaN
10 1 NaN
11 1 NaN
12 1 NaN
13 1 NaN
14 1 NaN
15 0 NaN
16 0 NaN
17 0 NaN
18 0 NaN
19 0 NaN
20 1 NaN
21 1 NaN
CodePudding user response:
I propose an alternative way that makes use of the split string function.
Let's transform the Series df.A into a string and then split it where the zeros are.
df = pd.DataFrame({'A':[1,1,1,1,1,0,0,1,1,0,1,1,1,1,1,0,0,0,0,0,1,1]})
ll = ''.join(df.A.astype('str').tolist()).split('0')
The list ll looks like
print(ll)
['11111', '', '11', '11111', '', '', '', '', '11']
now we count the lengths of every string and put it into a list
[len(item) for item in ll if len(item)>0]
This is doable if the Series is not too long.