I have a DataFrame that looks like the following:
df = pd.DataFrame({'a':[True]*5 [False]*5 [True]*5,'b':[False] [True]*3 [False] [True]*5 [False]*4 [True]})
a b
0 True False
1 True True
2 True True
3 True True
4 True False
5 False True
6 False True
7 False True
8 False True
9 False True
10 True False
11 True False
12 True False
13 True False
14 True False
How can I select blocks where column a
is True
only when the interior values over the same rows for column b
are True
?
I know that I could find break apart the DataFrame into consecutive True
regions, and apply a function to each DataFrame chunk, but this is for a much larger problem with 10 million rows, and I don't think such a solution would scale up very well.
My expected output would be the following:
a b c
0 True False True
1 True True True
2 True True True
3 True True True
4 True False True
5 False True False
6 False True False
7 False True False
8 False True False
9 False True False
10 True False False
11 True False False
12 True False False
13 True False False
14 True True False
CodePudding user response:
You can do a groupby on the a values and then look at the b values in a function, like this:
groupby_consec_a = df.groupby(df.a.diff().ne(0).cumsum())
all_interior = lambda x: x.iloc[1:-1].all()
df['c'] = df.a & groupby_consec_a.b.transform(all_interior)
Try out whether it's fast enough on your data. If not, the lambda will have to be replaced by pandas functions, but that will be more code.