Home > Software engineering >  Applying rolling function with second data frame
Applying rolling function with second data frame

Time:07-22

Let's take two datasets:

import pandas as pd 
import numpy as np
df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 6, 7])

check_df = pd.DataFrame([3, 2, 5, 4, 3, 6, 4, 2, 1])

I want to do the following thing:

  1. If any of numbers df[0:3] is greater than check_df[0], then we return 1 and 0 otherwise
  2. If any of numbers df[1:4] is greater than check_df[1] then we return 1 and 0 otherwise
  3. And so on...

It can be done, by rolling function and custom function:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[0])).any()

And then by combining this with apply function:

df.rolling(3, min_periods = 3).apply(custom_fun).shift(-2)

The main problem in my solution, is that I always compare with check_df[0], whereas in i-th rolling window, I should compare with check_df[i], but I have no idea how it can be specified in the rolling function. Could you please give me a hand in this problem?

CodePudding user response:

IIUC, you could use the first index of x, for example, with first_valid_index:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[x.first_valid_index()])).any()


res = df.rolling(3, min_periods=3).apply(custom_fun).shift(-2)

print(res)

Output

     0
0  0.0
1  1.0
2  0.0
3  1.0
4  1.0
5  0.0
6  1.0
7  NaN
8  NaN

As an alternative, use:

def custom_fun(x: pd.DataFrame):
    return (x > float(check_df.iloc[x.index[0]])).any()
  • Related