Home > Mobile >  Correlation between two dataframe colunms using a lookback window
Correlation between two dataframe colunms using a lookback window

Time:03-29

index Date Col_A Col_B Detection
0   1 Jan      0    1   0
1   2 Jan      0    0   0
2   3 Jan      1    0   0
3   4 Jan      0    1   1
4   5 Jan      0    0   0
5   6 Jan      1    0   0
6   7 Jan      0    0   0
7   8 Jan      0    0   1
8   9 Jan      0    0   0
9   10 Jan     0    0   0
10  11 Jan     0    0   1

I have the above datafame. I want to find a correlation between "Detection" and Col_A orCol_B columns as follows:

Loop over Detection column where ever df1["Detection"]==1, Then compare it with the index of Col_A, if df1["Col_A"]==1, then report there is a correlation (yes) else, lookback at two earlier position (say shift(-2)) if there exist an item with value ==1, Then we report yes else report No

Below is my trial code

df1["Corr_with_A"] = np.where((df1['Col_A'] == 1 or df1['Col_A'].shif(-1) == 1 or df1['Col_A'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
df1["Corr_with_B"] = np.where((df1['Col_B'] == 1 or df1['Col_B'].shif(-1) == 1 or df1['Col_B'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")

My expected output (How I wanted my output)

index Date  Col_A   Col_B   Detection   Corr_with_A Corr_with_B
0   1 Jan      0    1      0           no          no
1   2 Jan      0    0      0           no          no
2   3 Jan      1    0      0           no          no
3   4 Jan      0    1      1          Yes          Yes
4   5 Jan      0    0      0           no          no
5   6 Jan      1    0      0           no          no
6   7 Jan      0    0      0           no          no
7   8 Jan      0    0      1           Yes         no
8   9 Jan      0    0      0           no          no
9   10 Jan     0    0      0           no          no
10  11 Jan     0    0      1           no          no

Can someone please come up with a better way to achieve this? My code is giving me error. Thank you.

CodePudding user response:

This is a great use case for rolling.max:

N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
m1 = df['Col_A'].rolling(window=N, min_periods=1).max().eq(1)
m2 = df['Col_B'].rolling(window=N, min_periods=1).max().eq(1)

df['Corr_with_A'] = np.where(m0&m1, 'yes', 'no')
df['Corr_with_B'] = np.where(m0&m2, 'yes', 'no')

output:

    index Date  Col_A  Col_B  Detection Corr_with_A Corr_with_B
0       1  Jan      0      1          0          no          no
1       2  Jan      0      0          0          no          no
2       3  Jan      1      0          0          no          no
3       4  Jan      0      1          1         yes         yes
4       5  Jan      0      0          0          no          no
5       6  Jan      1      0          0          no          no
6       7  Jan      0      0          0          no          no
7       8  Jan      0      0          1         yes          no
8       9  Jan      0      0          0          no          no
9      10  Jan      0      0          0          no          no
10     11  Jan      0      0          1          no          no

as a loop:

N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)

for col in ['A', 'B']:
    m_rol = df[f'Col_{col}'].rolling(window=N, min_periods=1).max().eq(1)
    df[f'Corr_with_{col}'] = np.where(m0&m_rol, 'yes', 'no')
  • Related