index Date Col_A Col_B Detection
0 1 Jan 0 1 0
1 2 Jan 0 0 0
2 3 Jan 1 0 0
3 4 Jan 0 1 1
4 5 Jan 0 0 0
5 6 Jan 1 0 0
6 7 Jan 0 0 0
7 8 Jan 0 0 1
8 9 Jan 0 0 0
9 10 Jan 0 0 0
10 11 Jan 0 0 1
I have the above datafame. I want to find a correlation between "Detection"
and Col_A
orCol_B
columns as follows:
Loop over Detection
column where ever df1["Detection"]==1, Then compare it with the index of Col_A, if df1["Col_A"]==1
, then report there is a correlation (yes
) else, lookback at two earlier position (say shift(-2)
) if there exist an item with value ==1
, Then we report yes
else report No
Below is my trial code
df1["Corr_with_A"] = np.where((df1['Col_A'] == 1 or df1['Col_A'].shif(-1) == 1 or df1['Col_A'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
df1["Corr_with_B"] = np.where((df1['Col_B'] == 1 or df1['Col_B'].shif(-1) == 1 or df1['Col_B'].shif(-2) == 1) & (df1['Detection'] ==1), "Yes", "no")
My expected output (How I wanted my output)
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 Yes Yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 Yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
Can someone please come up with a better way to achieve this? My code is giving me error. Thank you.
CodePudding user response:
This is a great use case for rolling.max
:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
m1 = df['Col_A'].rolling(window=N, min_periods=1).max().eq(1)
m2 = df['Col_B'].rolling(window=N, min_periods=1).max().eq(1)
df['Corr_with_A'] = np.where(m0&m1, 'yes', 'no')
df['Corr_with_B'] = np.where(m0&m2, 'yes', 'no')
output:
index Date Col_A Col_B Detection Corr_with_A Corr_with_B
0 1 Jan 0 1 0 no no
1 2 Jan 0 0 0 no no
2 3 Jan 1 0 0 no no
3 4 Jan 0 1 1 yes yes
4 5 Jan 0 0 0 no no
5 6 Jan 1 0 0 no no
6 7 Jan 0 0 0 no no
7 8 Jan 0 0 1 yes no
8 9 Jan 0 0 0 no no
9 10 Jan 0 0 0 no no
10 11 Jan 0 0 1 no no
as a loop:
N = 3 # number of rows to consider
m0 = df['Detection'].eq(1)
for col in ['A', 'B']:
m_rol = df[f'Col_{col}'].rolling(window=N, min_periods=1).max().eq(1)
df[f'Corr_with_{col}'] = np.where(m0&m_rol, 'yes', 'no')