I have a dataframe df
with a binary column target,
and I want to compute the occurrences of at least n
consecutive rows such that df[target] == 1
.
I found a lot of answers about computing the occurrences of (exactly) n
consecutive rows meeting a given condition on some dataframe columns. But they don't solve my problem.
I could use the fact that the target is finite to build the following algorithm to solve my problem :
target = [0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,1,1,1]
df = pd.DataFrame(
{"target" : target}
)
n = 3
groups = []
for i in range(df.size):
if df["target"].iloc[i] == 0:
continue
group_index_min = df.index[i]
for j in range(i, df.size):
if df["target"].iloc[j] == 1:
group_index_max = df.index[j]
else:
break
current_group = (group_index_min, group_index_max)
is_sub_group = False
for group in groups:
a, b = group
if a <= group_index_min and group_index_max <= b:
is_sub_group = True
if (not is_sub_group) and (group_index_max - group_index_min 1 >= n):
groups.append(current_group)
groups
# >> [(2, 4), (9, 11), (31, 36)]
But, I would prefer a solution using the power of numpy or pandas, more pythonic.
Can someone help me? Many thanks!
CodePudding user response:
Try:
get_group = lambda x: (x.index[0], x.index[-1]) if len(x) >= 3 else None
groups = df['target'].eq(0).cumsum()[df['target'].ne(0)].to_frame() \
.groupby('target').apply(get_group).dropna().tolist()
print(groups)
# Output:
[(2, 4), (9, 11), (31, 36)]
CodePudding user response:
# find index values where target goes from 0 to 1 or from 1 to 0
change_points = df[(df.target == 1) & ((df.target.shift(fill_value=0) == 0) | (df.target.shift(-1, fill_value=0) == 0))].index
# group change points into pairs, e.g. [1, 3, 4, 10] -> [(1, 3), (4, 10)]
groups_ = list(zip(change_points[::2], change_points[1::2]))
# keep only groups of minimal length
groups = [(a, b) for a, b in groups_ if b - a 1 >= n]
The value of groups
for the provided input data is [(2, 4), (9, 11), (31, 36)]
.