Mark True from conditions satisfy on two consecutive values till another two consecutive values-CodePudding

I have a float column in a dataframe. And I want to add another boolean column which will be True if condition satisfies on two consecutive values till another condition satisfies on next two consecutive values.

For Example I have a data-frame which look like this:

index	Values %
0	0
1	5
2	11
3	9
4	14
5	18
6	30
7	54
8	73
9	100
10	100
11	100
12	100
13	100

Now I want to mark True from where two consecutive values satisfies the condition df['Values %'] >= 10 till next two consecutive values satisfies the next condition i.e. df[Values %] == 100.

So the final result will look like something this:

index	Values %	Flag
0	0	False
1	5	False
2	11	False
3	9	False
4	14	False
5	18	True
6	30	True
7	54	True
8	73	True
9	100	True
10	100	True
11	100	False
12	100	False
13	100	False

CodePudding user response：

Not sure how exactly the second part of your question is supposed to work but here is how to achieve the first.

example data

s = pd.Series([0,5,11,9,14,18,2,14,16,18])

solution

# create true/false series for first condition and take cumulative sum
x = (s >= 10).cumsum()

# compare each element of x with 2 elements before.  There will be a difference of 2 for elements which belong to streak of 2 or more True
condition = x - x.shift(2) == 2

condition looks like this

0    False
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8     True
9     True
dtype: bool

CodePudding user response：

I have a rather inefficient way of doing this. It's not vectorised, so not ideal, but it works:

# Convert the values column to a 1D NumPy array for ease of use.
values = df["Values %"].tolist()
values_np = np.array(values)

# Initialize flags 1D array to be the same size as values_np. Initially set to all 0s. Uses int form of booleans, i.e. 0 = False and 1 = True.
flags = np.zeros((values_np.shape[0]), dtype=int)

# Iterate from 1st (not 0th) row to last row.
for i in range(1, values_np.shape[0]):
    # First set flag to 1 (True) if meets the condition that consecutive values are both >= 10.
    if values_np[i] >= 10 and values_np[i-1] >= 10:
        flags[i] = 1

    # Then if consecutive values are both larger than 100, set flag to 0 (False).
    if values_np[i] >= 100 and values_np[i-1] >= 100:
        flags[i] = 0

# Turn flags into boolean form (i.e. convert 0 and 1 to False and True).
flags = flags.astype(bool)

# Add flags as a new column in df.
df["Flags"] = flags

One thing -- my method gives False for row 10, because both row 9 and row 10 >= 100. If this is not what you wanted, let me know and I can change it so that the flag is True only if the previous two values and the current value (3 consecutive values) are all >= 100.