Home > Software engineering >  Column increment counter and reset based on other columns
Column increment counter and reset based on other columns

Time:04-25

I am trying to create a column that counts incrementally as it goes down the rows group by one column and based on yet another column.

Additionaly if one of the status is blank (as in an empty string) it would be ignored and repeat the count for that blank from the previous one.

So I have this data

      car  status  
0    audi   False      
1    audi   False      
2    audi   False      
3    audi    True      
4     bmw   False     
5     bmw   ''      
6     bmw   False      
7     bmw    True      
8     bmw   False      
9   lexus    True      
10  lexus    True      
11  lexus    True

I would like to add a colum that counts how many consecutive False are in status as it goes counting for each car and reset when there is a True to start again if there is another False or another car

      car  status  counter
0    audi   False  1
1    audi   False  2   
2    audi   False  3   
3    audi    True  0   
4     bmw   False  1   
5     bmw   ''     1 
5     bmw   False  2    
6     bmw    True  0    
7     bmw   False  1         
9   lexus    True  0
10  lexus    True  0   
11  lexus   False  1

I am trying this but it increments by car

import pandas as pd

data = [['audi', False],
        ['audi', False],
        ['audi', False],
        ['audi', True],
        ['bmw', False],
        ['bmw', False],
        ['bmw', False],
        ['bmw', True],
        ['bmw', False],
        ['lexus', True],
        ['lexus', True],
        ['lexus', False]]

df = pd.DataFrame(data=data, columns=['car', 'status'])

df['count'] = df.groupby('car')['status'].transform(lambda x: x.ne(x.shift()).cumsum())

print(df)

CodePudding user response:

You can use GroupBy.cumcount with grouping consecutive values, last set 0 for status=True values by Series.mask:

s = df['status'].eq(True)
df['count'] = (df.groupby(['car', s.ne(s.shift()).cumsum()])
                 .cumcount()
                 .add(1)
                 .mask(df['status'], 0))

print(df)
      car  status  count
0    audi   False      1
1    audi   False      2
2    audi   False      3
3    audi    True      0
4     bmw   False      1
5     bmw   False      2
6     bmw   False      3
7     bmw    True      0
8     bmw   False      1
9   lexus    True      0
10  lexus    True      0
11  lexus   False      1

Alternative solution, for working with boolean column only:

s = df['status'].eq(True)
df['count'] = (df.groupby(['car', (~s & s.shift(fill_value=False)).cumsum()])
                 .cumcount()
                 .add(1)
                 .mask(df['status'], 0))
  • Related