Calculating length of sequence of zeros in Pandas-CodePudding

I have a table like this

Unit	status	date
One	1	1
One	1	2
One	1	3
One	0	4
One	0	5
One	1	6
One	1	7

and I want to create a new column where I'd have the size of the sequence of zeros from the status column. So for that example, the output would be

Unit	status	date	gap
One	1	1	0
One	1	2	0
One	1	3	0
One	0	4	2
One	0	5	2
One	1	6	0
One	1	7	0

This would have to be done for all the units in the DataFrame. I was basing myself on this question, but I'm stuck in the part where I set the total size for all the rows that are part of the gap

CodePudding user response：

The usual way to group the block of some values is to cumsum on the other values. Given that your data is sorted by Unit:

df['gap'] = (df.groupby(['Unit', 'status', df['status'].cumsum()])
             ['status'].transform('size')
             .where(df['status'].eq(0), other=0)
            )

Output:

  Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0

CodePudding user response：

Another approach could be to use run-length encoding via package python-rle:

import rle 

r = rle.encode(df.status)

df['gap'] = (rle
  .decode([r[1][x] if r[0][x] == 0 else 0 for x in range(len(r[0]))], r[1]))

Output:

 Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0