I have dataframe like :
datetime sensor1 sensor2
String Int64 Int64
1 2021-09-28 13:36:04 626 570
2 2021-09-28 13:36:04 622 571
3 2021-09-28 13:36:05 620 574
4 2021-09-28 13:36:06 619 578
I would like to get correlation coefficient score between column sensor1 and sensor2 on the above dataframe. For example, in Python, I can do it as :
cor = np.corrcoef(data.sensor1[0:] , data.sensor2[0:])[0,1]
How can I get the correlation coefficient in Julia?
CodePudding user response:
Use cor
from the Statistics
standard library:
julia> using Statistics, DataFrames
julia> df = DataFrame(sensor1 = [626, 622, 620, 619], sensor2 = [570, 571, 574, 578])
4×2 DataFrame
Row │ sensor1 sensor2
│ Int64 Int64
─────┼──────────────────
1 │ 626 570
2 │ 622 571
3 │ 620 574
4 │ 619 578
julia> cor(Matrix(df))
2×2 Matrix{Float64}:
1.0 -0.861357
-0.861357 1.0
Here passing Matrix(df)
means you'll get back a correlation matrix with the correlations between all columns.
More specifically for just two columns, which I guess is in line with your Python example:
julia> cor(df.sensor1, df.sensor2)
-0.861356769214109
EDIT: Actually I see you are doing [0, 1]
indexing in Python, so you're probably getting back a 2x2 matrix there as well - arrays in Julia are 1-based so the equivalent would be cor(Matrix(df))[1, 2]
. If you only want one number though there's no point computing all cross-correlations.