Home > Enterprise >  groupby streak of numbers in one column of pandas dataframe
groupby streak of numbers in one column of pandas dataframe

Time:07-25

This is my dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0],
        'b': [0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0]
    }
)

And this is the way that I want to group it:

2   1  1
3   0  1
4   0  1
5   0  1
6   0  0
7   0  0

9   1  0
10  0  1

13  1  1
14  0  1
15  0  1
16  0  0
17  0  1

I want to group this dataframe based on values of column b. The first thing to do is to find the 1s in column a. And then I want to continue as long as there is 0 in column b and after that get the row after that 0 as well. If the value in a is 1 and the value in b is 0 I want to continue only for one row. Basically I want to stop as soon as there is a 0 in column b and then go on one row after that 0.

I have tried these two posts: post1, post2 but still have problem to solve this.

I have tried to group them by: df.b.diff().cumsum() but it doesn't give me what I want

CodePudding user response:

Use cumsum to create a helper Series for filtering/grouping, then subfilter each group with a boolean mask:

group = df['a'].cumsum()

for k, g in df[group>0].groupby(group):
    # drop rows 2 places after the first 0
    m = g['b'].ne(0).cummin().shift(2, fill_value=True)
    print(g[m])

Output:

   a  b
2  1  1
3  0  1
4  0  1
5  0  1
6  0  0
7  0  0

    a  b
9   1  0
10  0  1

    a  b
13  1  1
14  0  1
15  0  1
16  0  0
17  0  1

As single dataframe:

Either run the above and concat, or:

group = df['a'].cumsum()
m = df['b'].ne(0).groupby(group).apply(lambda x: x.cummin().shift(2, fill_value=True))

out = df[group.gt(0)&m]

Output:

    a  b
2   1  1
3   0  1
4   0  1
5   0  1
6   0  0
7   0  0
9   1  0
10  0  1
13  1  1
14  0  1
15  0  1
16  0  0
17  0  1
  • Related