I have a float column in a dataframe. And I want to add another boolean column which will be True if condition satisfies on two consecutive values till another condition satisfies on next two consecutive values.
For Example I have a data-frame which look like this:
index | Values % |
---|---|
0 | 0 |
1 | 5 |
2 | 11 |
3 | 9 |
4 | 14 |
5 | 18 |
6 | 30 |
7 | 54 |
8 | 73 |
9 | 100 |
10 | 100 |
11 | 100 |
12 | 100 |
13 | 100 |
Now I want to mark True from where two consecutive values satisfies the condition df['Values %'] >= 10
till next two consecutive values satisfies the next condition i.e. df[Values %] == 100
.
So the final result will look like something this:
index | Values % | Flag |
---|---|---|
0 | 0 | False |
1 | 5 | False |
2 | 11 | False |
3 | 9 | False |
4 | 14 | False |
5 | 18 | True |
6 | 30 | True |
7 | 54 | True |
8 | 73 | True |
9 | 100 | True |
10 | 100 | True |
11 | 100 | False |
12 | 100 | False |
13 | 100 | False |
CodePudding user response:
Not sure how exactly the second part of your question is supposed to work but here is how to achieve the first.
example data
s = pd.Series([0,5,11,9,14,18,2,14,16,18])
solution
# create true/false series for first condition and take cumulative sum
x = (s >= 10).cumsum()
# compare each element of x with 2 elements before. There will be a difference of 2 for elements which belong to streak of 2 or more True
condition = x - x.shift(2) == 2
condition
looks like this
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 True
9 True
dtype: bool
CodePudding user response:
I have a rather inefficient way of doing this. It's not vectorised, so not ideal, but it works:
# Convert the values column to a 1D NumPy array for ease of use.
values = df["Values %"].tolist()
values_np = np.array(values)
# Initialize flags 1D array to be the same size as values_np. Initially set to all 0s. Uses int form of booleans, i.e. 0 = False and 1 = True.
flags = np.zeros((values_np.shape[0]), dtype=int)
# Iterate from 1st (not 0th) row to last row.
for i in range(1, values_np.shape[0]):
# First set flag to 1 (True) if meets the condition that consecutive values are both >= 10.
if values_np[i] >= 10 and values_np[i-1] >= 10:
flags[i] = 1
# Then if consecutive values are both larger than 100, set flag to 0 (False).
if values_np[i] >= 100 and values_np[i-1] >= 100:
flags[i] = 0
# Turn flags into boolean form (i.e. convert 0 and 1 to False and True).
flags = flags.astype(bool)
# Add flags as a new column in df.
df["Flags"] = flags
One thing -- my method gives False for row 10, because both row 9 and row 10 >= 100. If this is not what you wanted, let me know and I can change it so that the flag is True only if the previous two values and the current value (3 consecutive values) are all >= 100.