Preface: I am a beginner to R that is eager to learn. Please don't mistake the simplicity of the question (if it is a simple answer) for lack of research or effort!
I have a data.table named data
with columns labelled V1 through V20. I want to take the correlation of certain columns (1,6,7,9,10 and 11) relative to V18 such that I have a new table that looks like the following
Variable Correlation_to_V18
V1 cor(V1,V18)
V6 cor(V6,V18)
V7 cor(V7,V18)
V9 cor(V9,V18)
V10 cor(V10,V18)
V11 cor(V11,V18)
I tried using a for loop to no avail
column <- c(1,6,7,9,10,11)
for (i in column) {
correlations<-cor(data[,18], data[,as.numeric(i)])
cor_table<- data.table(variables = colnames(data[,as.numeric(column)]), correlation_val = correlations)
return(cor_table)
}
CodePudding user response:
Let's make some data (50 rows x 20 columns) that sounds like your data:
library(dplyr)
library(tidyr)
library(corrr)
set.seed(123)
df1 <- data.frame(matrix(rnorm(1000),
ncol = 20,
nrow = 50,
dimnames = list(c(), paste0("V", 1:20))))
In general when using R, processing dataframes using loops is a bad idea. There are apply
functions, tidyverse
packages and other packages to make working with dataframes easier.
In this case I would use the corrr
library to return a dataframe with the correlation coefficients. You can then filter for the term of interest V18
, reshape the output from "wide" to "long", and filter again for the other variables.
correlate(df1) %>%
filter(term == "V18") %>%
pivot_longer(2:21) %>%
filter(name %in% c("V1", "V6", "V7", "V9", "V10", "V11"))
Result:
# A tibble: 6 x 3
term name value
<chr> <chr> <dbl>
1 V18 V1 0.125
2 V18 V6 -0.167
3 V18 V7 0.345
4 V18 V9 -0.110
5 V18 V10 0.0473
6 V18 V11 -0.0347
CodePudding user response:
Here is another potential solution:
library(data.table)
set.seed(123)
mat <- matrix(runif(400), nrow = 20, ncol = 20)
data <- data.table(mat)
columns_of_interest <- c(1,6,7,9,10,11,18)
data_subset <- data[, columns_of_interest, with=FALSE]
data_cor_values <- data_subset[, .(Correlation_to_V18 = cor(.SD, V18))]
data_cor_values[, variable := columns_of_interest]
data_cor_values
#> Correlation_to_V18.V1 variable
#> 1: -0.049812188 1
#> 2: -0.375566877 6
#> 3: 0.089879501 7
#> 4: -0.022499113 9
#> 5: -0.007267059 10
#> 6: -0.178489961 11
#> 7: 1.000000000 18
# Trim the last row (V18)
data_cor_values[1:.N-1,]
#> Correlation_to_V18.V1 variable
#> 1: -0.049812188 1
#> 2: -0.375566877 6
#> 3: 0.089879501 7
#> 4: -0.022499113 9
#> 5: -0.007267059 10
#> 6: -0.178489961 11
# Check the answer is correct:
cor(data_subset$V1, data_subset$V18)
#> [1] -0.04981219
cor(data_subset$V6, data_subset$V18)
#> [1] -0.3755669
Created on 2021-11-30 by the reprex package (v2.0.1)