I have this df:
x
0 2
1 2
2 2
3 1
4 1
5 2
6 2
I need to compare current value on column x
with respect to the n
previous and next values based on a defined condition, if condition is met q
times then add 1 in a new column, if not, add 0.
For instance, if n
is 2, q
is 3 and the condition is current_value <= value / 2
. In this case, the code will do 7 comparisons:
1st comparison: compare current_value = 2
to previous n = 2
numbers (in this case there are no such numbers because is the first value on the column) and then compare current_value = 2
to the next n = 2
values (in this case both numbers are 2, so condtion is not met on neither (2 <= 2/2)). In this case there are no conditions met, as q = 3 >= 0
the code adds 0 to the new column.
2nd comparison: compare current_value = 2
to previous n = 2
numbers (in this case there is just one number above, the condition is not met (2 <= 2/2)) and then compare current_value = 2
to the next n = 2
values (in this case there's a number 2 and then a number 1, so condition is not met (2 <= 2/2 and 2 <= 1/2)). In this case there are no conditions met, as q = 3 >= 0
the code adds 0 to the new column.
3rd comparison: In this case there are no condition met, as q = 3 >= 0
the code adds 0 to the new column.
4th comparison: compare current_value = 1
to previous n = 2
numbers (in this case there are two number 2 above, the condition is met on both of them (1 <= 2/2)) and then compare current_value = 1
to the next n = 2
values (in this case there's a number 1 and then a number 2, so condition is met once (1 <= 2/2 and 1 <= 1/2)). In this case there are 3 conditions met, as q = 3 >= 3
the code adds 1 to the new column.
5th comparison: In this case there are 3 conditions met, as q = 3 >= 3
the code adds 1 to the new column.
6th comparison: In this case there are no conditions met, as q = 3 >= 0
the code adds 0 to the new column.
7th comparison: In this case there are no conditions met, as q = 3 >= 0
the code adds 0 to the new column.
Desired result:
x comparison
0 2 0
1 2 0
2 2 0
3 1 1
4 1 1
5 2 0
6 2 0
I was thinking on using something like shift
function but I'm not sure how to implement it. Any help?
CodePudding user response:
I suggest to use numpy here, to benefit from its sliding window view:
import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv
n = 2
q = 3
# convert to numpy array
a = df['x'].astype(float).to_numpy()
# create a sliding window
# remove central value, divide by 2
# compare to original value
# count number of matches
count = (a[:,None] <= swv(np.pad(a, n, constant_values=np.nan), 2*n 1)[:, np.r_[:n,n 1:2*n 1]]/2).sum(1)
# array([0, 0, 0, 3, 3, 0, 0])
# compare number of matches to q
df['comparison'] = (count >= q).astype(int)
print(df)
An alternative with only pandas would require to compute two rolling windows (forward and backward) as it's not trivial to access the current index in a centered rolling with min_periods=1
:
n = 2
q = 3
s1 = df['x'].rolling(n 1, min_periods=2).apply(lambda x: sum(x.iloc[-1]<=x.iloc[:-1]/2))
s2 = df.loc[::-1, 'x'].rolling(n 1, min_periods=2).apply(lambda x: sum(x.iloc[-1]<=x.iloc[:-1]/2))
df['comparison'] = s1.add(s2, fill_value=0).ge(3).astype(int)
Output:
x comparison
0 2 0
1 2 0
2 2 0
3 1 1
4 1 1
5 2 0
6 2 0