Home > database >  Getting A Correlation Column Based on Two Columns with A List Value
Getting A Correlation Column Based on Two Columns with A List Value

Time:09-21

I have the following dataset:

df = pd.DataFrame({'A': [[10, 11, 12], [13, 14, 15]], 
                   'B': [[17, 18, 12], [21, 22, 13]]})
df

          A               B
0   [10, 11, 12]    [17, 18, 12]
1   [13, 14, 15]    [21, 22, 13]

Now I want to create a new column Correlation based on the A and B columns using scipy.stats.pearsonr method. I'm trying this:

# Creating a function for correlation
def correlation(row):
    correlation, p_value = stats.pearsonr(row['A'], row['B'])
    return correlation

# Applying the function
df['Correlation'] = df.apply(correlation, axis = 1)
df

          A               B         Correlation
0   [10, 11, 12]    [17, 18, 12]    -0.777714
1   [13, 14, 15]    [21, 22, 13]    -0.810885

If I have too many columns, the above script would not the ideal. I am thinking if I can directly use stats.pearsonr in lambda to get the same result?

Any suggestions would be appreciated. Thanks!

CodePudding user response:

I will recommend use zip with for loop

df['out'] = [stats.pearsonr(x, y)[0] for x, y in zip(df.A, df.B)]
df
Out[163]: 
              A             B       out
0  [10, 11, 12]  [17, 18, 12] -0.777714
1  [13, 14, 15]  [21, 22, 13] -0.810885
  • Related