i have two data frames in R say data1 and data2:
a = c(1,2,NA,4,5)
b = c(3,4,5,6,7)
data1 = tibble(a,b);data1
a = c(4,2,4,4,9)
b = c(3,4,4,6,7)
d = c(5,9,3,4,2)
data2 = tibble(a,b,d);data2
i want to calculate the correlation of these two data frames matched columns.Keep in mind that i might have NA in some column vectors and also some columns might not exist in the initial data frame 1 which ideally i want to report NA.How i can do that in R using dplyr ?
CodePudding user response:
Since column a
in data1
contains 1 NA, the output should be NA for a
. You may do this
library(tidyverse)
a = c(1,2,NA,4,5)
b = c(3,4,5,6,7)
data1 = tibble(a,b);
data1
#> # A tibble: 5 × 2
#> a b
#> <dbl> <dbl>
#> 1 1 3
#> 2 2 4
#> 3 NA 5
#> 4 4 6
#> 5 5 7
a = c(4,2,4,4,9)
b = c(3,4,4,6,7)
d = c(5,9,3,4,2)
data2 = tibble(a,b,d);data2
#> # A tibble: 5 × 3
#> a b d
#> <dbl> <dbl> <dbl>
#> 1 4 3 5
#> 2 2 4 9
#> 3 4 4 3
#> 4 4 6 4
#> 5 9 7 2
names(data2) %>%
map_dbl(~ {col <- if(is.null(data1[[.x]])){
rep(NA, dim(data1)[1])
} else {
data1[[.x]]
}
cor(col, data2[[.x]])
}) %>% set_names(names(data2))
#> a b d
#> NA 0.9622504 NA
Created on 2022-07-11 by the reprex package (v2.0.1)
OR usingb stack()
will give you a dataframe
names(data2) %>%
map_dbl(~ {col <- if(is.null(data1[[.x]])){
rep(NA, dim(data1)[1])
} else {
data1[[.x]]
}
cor(col, data2[[.x]])
}) %>% set_names(names(data2)) %>%
stack()
#> values ind
#> 1 NA a
#> 2 0.9622504 b
#> 3 NA d
Created on 2022-07-11 by the reprex package (v2.0.1)
CodePudding user response:
library(tibble)
library(purrr)
a = c(1,2,NA,4,5)
b = c(3,4,5,6,7)
data1 = tibble(a,b)
a = c(4,2,4,4,9)
b = c(3,4,4,6,7)
d = c(5,9,3,4,2)
data2 = tibble(a,b,d)
matched <- intersect(colnames(data1), colnames(data2))
names(matched) <- matched
map_dbl(matched, ~ cor(data1[[.x]], data2[[.x]], use = "complete.obs")) %>%
as.matrix() %>%
as.data.frame() %>%
rownames_to_column()
#> rowname V1
#> 1 a 0.7337014
#> 2 b 0.9622504
Created on 2022-07-11 by the reprex package (v2.0.1)