I am having a data frame of the form
df = pd.DataFrame(np.array([[1, 2, True], [3, 4, True], [5, 6, False], [7, 8, True], [9, 10, False], [11, 12, True], [13, 14, True]]), columns=['a', 'b', 'c'])
and I want to group the rows that have more than one consecutive True
value into a list. More specifically, the output will be a list where the first item is df.loc[0:2]
and the second item is df.loc[5:6]
. Could you please suggest a way to do that using builtin functions?
CodePudding user response:
Assuming this input:
df = pd.DataFrame([[1, 2, True], [3, 4, True], [5, 6, False], [7, 8, True], [9, 10, False], [11, 12, True], [13, 14, True]], columns=['a', 'b', 'c'])
You can use groupby
with groups starting with True:
m = (df['c']&~df['c'].shift(fill_value=False)).cumsum()
out = [d for _,d in df[df['c']].groupby(m) if len(d)>1]
Output:
[ a b c
0 1 2 True
1 3 4 True,
a b c
5 11 12 True
6 13 14 True]
CodePudding user response:
You can try itertools.groupby
from itertools import groupby
from operator import itemgetter
for k, g in groupby(zip(df.index, df['c']), itemgetter(1)):
gs = list(g)
if k and len(gs) > 1:
idxs = list(map(itemgetter(0), gs))
print(idxs)
[0, 1]
[5, 6]