Let's take two datasets:
import pandas as pd
import numpy as np
df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 6, 7])
check_df = pd.DataFrame([3, 2, 5, 4, 3, 6, 4, 2, 1])
I want to do the following thing:
- If any of numbers
df[0:3]
is greater thancheck_df[0]
, then we return 1 and 0 otherwise - If any of numbers
df[1:4]
is greater thancheck_df[1]
then we return 1 and 0 otherwise - And so on...
It can be done, by rolling
function and custom function:
def custom_fun(x: pd.DataFrame):
return (x > float(check_df.iloc[0])).any()
And then by combining this with apply
function:
df.rolling(3, min_periods = 3).apply(custom_fun).shift(-2)
The main problem in my solution, is that I always compare with check_df[0]
, whereas in i-th rolling window, I should compare with check_df[i]
, but I have no idea how it can be specified in the rolling function. Could you please give me a hand in this problem?
CodePudding user response:
IIUC, you could use the first index of x
, for example, with first_valid_index
:
def custom_fun(x: pd.DataFrame):
return (x > float(check_df.iloc[x.first_valid_index()])).any()
res = df.rolling(3, min_periods=3).apply(custom_fun).shift(-2)
print(res)
Output
0
0 0.0
1 1.0
2 0.0
3 1.0
4 1.0
5 0.0
6 1.0
7 NaN
8 NaN
As an alternative, use:
def custom_fun(x: pd.DataFrame):
return (x > float(check_df.iloc[x.index[0]])).any()