Home > Software engineering >  Calculating length of sequence of zeros in Pandas
Calculating length of sequence of zeros in Pandas

Time:08-17

I have a table like this

Unit status date
One 1 1
One 1 2
One 1 3
One 0 4
One 0 5
One 1 6
One 1 7

and I want to create a new column where I'd have the size of the sequence of zeros from the status column. So for that example, the output would be

Unit status date gap
One 1 1 0
One 1 2 0
One 1 3 0
One 0 4 2
One 0 5 2
One 1 6 0
One 1 7 0

This would have to be done for all the units in the DataFrame. I was basing myself on this question, but I'm stuck in the part where I set the total size for all the rows that are part of the gap

CodePudding user response:

The usual way to group the block of some values is to cumsum on the other values. Given that your data is sorted by Unit:

df['gap'] = (df.groupby(['Unit', 'status', df['status'].cumsum()])
             ['status'].transform('size')
             .where(df['status'].eq(0), other=0)
            )

Output:

  Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0

CodePudding user response:

Another approach could be to use run-length encoding via package python-rle:

import rle 

r = rle.encode(df.status)

df['gap'] = (rle
  .decode([r[1][x] if r[0][x] == 0 else 0 for x in range(len(r[0]))], r[1]))

Output:

 Unit  status  date  gap
0  One       1     1    0
1  One       1     2    0
2  One       1     3    0
3  One       0     4    2
4  One       0     5    2
5  One       1     6    0
6  One       1     7    0
  • Related