I am trying to create a column that counts incrementally as it goes down the rows group by one column and based on yet another column.
Additionaly if one of the status is blank (as in an empty string) it would be ignored and repeat the count for that blank from the previous one.
So I have this data
car status
0 audi False
1 audi False
2 audi False
3 audi True
4 bmw False
5 bmw ''
6 bmw False
7 bmw True
8 bmw False
9 lexus True
10 lexus True
11 lexus True
I would like to add a colum that counts how many consecutive False
are in status as it goes counting for each car
and reset when there is a True
to start again if there is another False
or another car
car status counter
0 audi False 1
1 audi False 2
2 audi False 3
3 audi True 0
4 bmw False 1
5 bmw '' 1
5 bmw False 2
6 bmw True 0
7 bmw False 1
9 lexus True 0
10 lexus True 0
11 lexus False 1
I am trying this but it increments by car
import pandas as pd
data = [['audi', False],
['audi', False],
['audi', False],
['audi', True],
['bmw', False],
['bmw', False],
['bmw', False],
['bmw', True],
['bmw', False],
['lexus', True],
['lexus', True],
['lexus', False]]
df = pd.DataFrame(data=data, columns=['car', 'status'])
df['count'] = df.groupby('car')['status'].transform(lambda x: x.ne(x.shift()).cumsum())
print(df)
CodePudding user response:
You can use GroupBy.cumcount
with grouping consecutive values, last set 0
for status=True
values by Series.mask
:
s = df['status'].eq(True)
df['count'] = (df.groupby(['car', s.ne(s.shift()).cumsum()])
.cumcount()
.add(1)
.mask(df['status'], 0))
print(df)
car status count
0 audi False 1
1 audi False 2
2 audi False 3
3 audi True 0
4 bmw False 1
5 bmw False 2
6 bmw False 3
7 bmw True 0
8 bmw False 1
9 lexus True 0
10 lexus True 0
11 lexus False 1
Alternative solution, for working with boolean column only:
s = df['status'].eq(True)
df['count'] = (df.groupby(['car', (~s & s.shift(fill_value=False)).cumsum()])
.cumcount()
.add(1)
.mask(df['status'], 0))