Home > Back-end >  Correlation of every pandas row with another pandas dataframe as a new column
Correlation of every pandas row with another pandas dataframe as a new column

Time:07-27

Assuming I have the following df:

Company   Apples   Mangoes   Oranges

Amazon       0.75      0.6     0.98
BellTM       0.23      0.75    0.14
Cadbury      0.4       0.44    0.86

and then another data frame called vendor:

Company   Apples   Mangoes   Oranges

Deere       0.11      0.3     0.79

I want to find the row-wise correlation of each company with the company Deere in the vendor data frame. I want the outputted correlation coefficient added as a column called Correlationcoef to the original data frame df:

Company   Apples   Mangoes   Oranges     Corrcoef

Amazon       0.75      0.6     0.98     0.77955981 
BellTM       0.23      0.75    0.14    -0.37694478
Cadbury      0.4       0.44    0.86     0.98092707

When I attempt the following:

df.iloc[:,1:].corrwith(vendor.iloc[:,1:], axis=1)

I get a list with NaN values. I obtained the Corrcoef values manually by saving each row as an array and using np.corrcoef(x1,y)

CodePudding user response:

You need to use a Series in corrwith.

You can use:

df.set_index('Company').corrwith(vendor.set_index('Company').loc['Deere'], axis=1)

output:

Company
Amazon     0.779560
BellTM    -0.376945
Cadbury    0.980927
dtype: float64

With your code:

df.iloc[:, 1:].corrwith(vendor.iloc[0,1:].astype(float), axis=1)

output:

0    0.779560
1   -0.376945
2    0.980927
dtype: float64
  • Related