Group dataframe rows into a list-CodePudding

I am having a data frame of the form

df = pd.DataFrame(np.array([[1, 2, True], [3, 4, True], [5, 6, False], [7, 8, True], [9, 10, False], [11, 12, True], [13, 14, True]]), columns=['a', 'b', 'c'])

and I want to group the rows that have more than one consecutive True value into a list. More specifically, the output will be a list where the first item is df.loc[0:2] and the second item is df.loc[5:6]. Could you please suggest a way to do that using builtin functions?

CodePudding user response：

Assuming this input:

df = pd.DataFrame([[1, 2, True], [3, 4, True], [5, 6, False], [7, 8, True], [9, 10, False], [11, 12, True], [13, 14, True]], columns=['a', 'b', 'c'])

You can use groupby with groups starting with True:

m = (df['c']&~df['c'].shift(fill_value=False)).cumsum()

out = [d for _,d in df[df['c']].groupby(m) if len(d)>1]

Output:

[   a  b     c
 0  1  2  True
 1  3  4  True,
     a   b     c
 5  11  12  True
 6  13  14  True]

CodePudding user response：

You can try itertools.groupby

from itertools import groupby
from operator import itemgetter

for k, g in groupby(zip(df.index, df['c']), itemgetter(1)):
    gs = list(g)
    if k and len(gs) > 1:
        idxs = list(map(itemgetter(0), gs))
        print(idxs)

[0, 1]
[5, 6]