I have a data frame of 70 (rows) x 64000 (cols). I want to find the correlations between columns and rows for my data frame and sort them based on their absolute value, but when I use the coef() function I get NULL.
> coef(expressions70)
NULL
Is there any way to get coefficients similar to the paira.panels() output from the psych package? or any other way to show coefficients?
Thanks
CodePudding user response:
No, there is no object created from pairs.panels
. If you set it to an object, you'll see that the object's value is NULL
. However, there are still several ways that you could look at this. (Although considering Anscombe's squares, I would suggest that you don't take the rho value at face value.)
Both options create a named vector as the output. The name is the two correlated fields. The output is the rho value.
If all your fields are numeric or if you know exactly what columns are numeric, use this first option. If you have dates, character, and factor fields mixed in the columns, then use the second option.
First option:
library(funModeling)
library(tidyverse)
library(RcppAlgos)
# create all combinations
tellMe <- comboGeneral(names(iris[,1:4]),
2, T) %>%
as.data.frame()
showMe <- map(1:nrow(tellMe),
~setNames(
cor(iris[,tellMe[.x,1]],
iris[,tellMe[.x,2]],
"everything", "pearson"),
paste0(tellMe[.x, ], collapse = "-"))
) %>%
unlist() %>% sort(decreasing = T)
# Sepal.Width-Sepal.Width Sepal.Length-Sepal.Length
# 1.0000000 1.0000000
# Petal.Length-Petal.Length Petal.Width-Petal.Width
# 1.0000000 1.0000000
# Petal.Length-Petal.Width Sepal.Length-Petal.Length
# 0.9628654 0.8717538
# Sepal.Length-Petal.Width Sepal.Length-Sepal.Width
# 0.8179411 -0.1175698
# Sepal.Width-Petal.Width Sepal.Width-Petal.Length
# -0.3661259 -0.4284401
Second option
This starts with identifying which fields are either integer or numeric, then follows the same path as the first.
I have to say, I started with
select(where())
butwhere
is a:::
fortidyselect
now...so I went with an alternative method. If this doesn't make anything to you, just ignore this comment.
# if some variables are not numeric...
# apparently 'where' isn't in tidyselect anymore
fields <- df_status(iris) %>%
filter(type == "integer" | type == "numeric") %>%
select(variable) %>%
unlist(use.names = F)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
# find all possible combinations (with no repeats)
giveMe <- comboGeneral(fields, 2, T) %>%
as.data.frame()
itsShown <- map(1:nrow(giveMe),
~setNames(
cor(iris[,giveMe[.x,1]],
iris[,giveMe[.x,2]],
"everything", "pearson"),
paste0(giveMe[.x, ], collapse = "-"))
) %>%
unlist() %>% sort(decreasing = T)
# Sepal.Width-Sepal.Width Sepal.Length-Sepal.Length
# 1.0000000 1.0000000
# Petal.Length-Petal.Length Petal.Width-Petal.Width
# 1.0000000 1.0000000
# Petal.Length-Petal.Width Sepal.Length-Petal.Length
# 0.9628654 0.8717538
# Sepal.Length-Petal.Width Sepal.Length-Sepal.Width
# 0.8179411 -0.1175698
# Sepal.Width-Petal.Width Sepal.Width-Petal.Length
# -0.3661259 -0.4284401