Home > Software engineering >  Finding correlation in dataframe
Finding correlation in dataframe

Time:06-30

I have a pandas dataframe(df) that has columns (say x_1,x_2,....x_n as column names). I want to find a correlation (Pearson) between the ith column and the rest of the columns.

One way I can do this is by using the .corr() function

correlation = df.corr(method='pearson')
corr_i = correlation['x_i']

but this method is bit expensive since it finds correlations between all of the columns (all I need is only one column). The other method that I could do is

corr_i =[df['x_i'].corr(df[j], method ='pearson') for j in df.columns.tolist() if j!='x_i']

but I do feel that this is not efficient way of finding correlation given the flexibility of dataframe. Can anyone help me with very efficient method than above two? Thanks in advance.

CodePudding user response:

corrwith() might be what are looking for.

Say you had a data frame with columns c1,c2,c3,c4.

Then you should be able to:

df[['c2','c3','c4']].corrwith(df['c1'])
  • Related