Home > database >  correlation coefficient score between dataframe colunms in Julia
correlation coefficient score between dataframe colunms in Julia

Time:11-17

I have dataframe like :

    datetime             sensor1    sensor2 
    String               Int64      Int64
1   2021-09-28 13:36:04  626        570
2   2021-09-28 13:36:04  622        571
3   2021-09-28 13:36:05  620        574
4   2021-09-28 13:36:06  619        578

I would like to get correlation coefficient score between column sensor1 and sensor2 on the above dataframe. For example, in Python, I can do it as :

  cor = np.corrcoef(data.sensor1[0:] , data.sensor2[0:])[0,1]

How can I get the correlation coefficient in Julia?

CodePudding user response:

Use cor from the Statistics standard library:

julia> using Statistics, DataFrames

julia> df = DataFrame(sensor1 = [626, 622, 620, 619], sensor2 = [570, 571, 574, 578])
4×2 DataFrame
 Row │ sensor1  sensor2 
     │ Int64    Int64   
─────┼──────────────────
   1 │     626      570
   2 │     622      571
   3 │     620      574
   4 │     619      578

julia> cor(Matrix(df))
2×2 Matrix{Float64}:
  1.0       -0.861357
 -0.861357   1.0

Here passing Matrix(df) means you'll get back a correlation matrix with the correlations between all columns.

More specifically for just two columns, which I guess is in line with your Python example:

julia> cor(df.sensor1, df.sensor2)
-0.861356769214109

EDIT: Actually I see you are doing [0, 1] indexing in Python, so you're probably getting back a 2x2 matrix there as well - arrays in Julia are 1-based so the equivalent would be cor(Matrix(df))[1, 2]. If you only want one number though there's no point computing all cross-correlations.

  • Related