Home > database >  Compare current value with n values above and below on Pandas DataFrame
Compare current value with n values above and below on Pandas DataFrame

Time:11-07

I have this df:

   x
0  2
1  2
2  2
3  1
4  1
5  2
6  2

I need to compare current value on column x with respect to the n previous and next values based on a defined condition, if condition is met q times then add 1 in a new column, if not, add 0.

For instance, if n is 2, q is 3 and the condition is current_value <= value / 2. In this case, the code will do 7 comparisons:

1st comparison: compare current_value = 2 to previous n = 2 numbers (in this case there are no such numbers because is the first value on the column) and then compare current_value = 2 to the next n = 2 values (in this case both numbers are 2, so condtion is not met on neither (2 <= 2/2)). In this case there are no conditions met, as q = 3 >= 0 the code adds 0 to the new column.

2nd comparison: compare current_value = 2 to previous n = 2 numbers (in this case there is just one number above, the condition is not met (2 <= 2/2)) and then compare current_value = 2 to the next n = 2 values (in this case there's a number 2 and then a number 1, so condition is not met (2 <= 2/2 and 2 <= 1/2)). In this case there are no conditions met, as q = 3 >= 0 the code adds 0 to the new column.

3rd comparison: In this case there are no condition met, as q = 3 >= 0 the code adds 0 to the new column.

4th comparison: compare current_value = 1 to previous n = 2 numbers (in this case there are two number 2 above, the condition is met on both of them (1 <= 2/2)) and then compare current_value = 1 to the next n = 2 values (in this case there's a number 1 and then a number 2, so condition is met once (1 <= 2/2 and 1 <= 1/2)). In this case there are 3 conditions met, as q = 3 >= 3 the code adds 1 to the new column.

5th comparison: In this case there are 3 conditions met, as q = 3 >= 3 the code adds 1 to the new column.

6th comparison: In this case there are no conditions met, as q = 3 >= 0 the code adds 0 to the new column.

7th comparison: In this case there are no conditions met, as q = 3 >= 0 the code adds 0 to the new column.

Desired result:

   x comparison
0  2           0
1  2           0
2  2           0
3  1           1
4  1           1
5  2           0
6  2           0

I was thinking on using something like shift function but I'm not sure how to implement it. Any help?

CodePudding user response:

I suggest to use here, to benefit from its sliding window view:

import numpy as np
from numpy.lib.stride_tricks import sliding_window_view as swv

n = 2
q = 3

# convert to numpy array
a = df['x'].astype(float).to_numpy()

# create a sliding window
# remove central value, divide by 2
# compare to original value
# count number of matches
count = (a[:,None] <= swv(np.pad(a, n, constant_values=np.nan), 2*n 1)[:, np.r_[:n,n 1:2*n 1]]/2).sum(1)
# array([0, 0, 0, 3, 3, 0, 0])

# compare number of matches to q
df['comparison'] = (count >= q).astype(int)

print(df)

An alternative with only pandas would require to compute two rolling windows (forward and backward) as it's not trivial to access the current index in a centered rolling with min_periods=1:

n = 2
q = 3

s1 = df['x'].rolling(n 1, min_periods=2).apply(lambda x: sum(x.iloc[-1]<=x.iloc[:-1]/2))

s2 = df.loc[::-1, 'x'].rolling(n 1, min_periods=2).apply(lambda x: sum(x.iloc[-1]<=x.iloc[:-1]/2))

df['comparison'] = s1.add(s2, fill_value=0).ge(3).astype(int)

Output:

   x  comparison
0  2           0
1  2           0
2  2           0
3  1           1
4  1           1
5  2           0
6  2           0
  • Related