I have the following column with True
and False
boolean values. I want create a new column performing a cumulative sum on the True
values and if the value is False
reset the count, like this:
bool count
0 False 0
1 True 1
2 True 2
3 True 3
4 False 0
5 True 1
6 True 2
7 False 0
8 False 0
CodePudding user response:
Yes, this can be done, using a series of steps:
df['count'] = df.groupby(df['bool'].astype(int).diff().ne(0).cumsum())['bool'].cumsum()
Output:
>>> df
bool count
0 False 0
1 True 1
2 True 2
3 True 3
4 False 0
5 True 1
6 True 2
7 False 0
8 False 0
Explanation:
This code creates separate groups for all consecutive true values (1's) coming before a false value (0), then, treating the trues as 1's and the falses as 0's, computes the cumulative sum for each group, then concatenates the results together.
df.groupby
-df['bool'].astype(int)
- Takes each value ofbool
, converts it to an int (true -> 1, false -> 0),.diff()
- For each integer value, computes the difference between it an the previous value (so if the prev val was False and this is True, 1 (1 - 0
); if prev was True and this True, 0 (1 - 1
); etc.).ne(0)
- Converts all values that are not equal to 0 to true, and zeros to false (because(0 != 0) == False
).cumsum()
- Calculates cumulative sum for true (1) values. This way, all the trues before any false (0) get their own unique number, which is returned to thegroupby()
call, thus grouping separately each group of trues before a false
['bool'].cumsum()
- From each group of consecutive true values (1), get the cumulative sum those 1s.
CodePudding user response:
There might be a more Pythonic way to do it, but here is a simple iterative approach:
current_count = 0
for index, row in data.iterrows():
if (row['bool']):
current_count = 1
else:
current_count = 0
data.at[index, 'count'] = current_count
Here, data
is the dataframe that initially has one column named bool
, then we add the count
column!