xx <- data.frame(group = rep(1:4, each=100), a = rnorm(100) , b = rnorm(100))
xx[c(1,14,33), 'b'] = NA
I'm trying to calculate correlations by group but I'm getting an error when there are NAs.
library(dplyr)
xx %>% group_by(group) %>% summarize(COR=cor(a,b,na.rm=TRUE))
Error: Problem with `summarise()` column `COR`.
i `COR = cor(a, b, na.rm = TRUE)`.
x unused argument (na.rm = TRUE)
i The error occurred in group 1: group = 1.
Run `rlang::last_error()` to see where the error occurred.
CodePudding user response:
There is no na.rm
argument in cor
, it is use
. According to ?cor
, the usage is
cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
use - an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings "everything", "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs".
library(dplyr)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "complete.obs"))
-output
# A tibble: 4 × 2
group COR
<int> <dbl>
1 1 0.166
2 2 0.190
3 3 0.190
4 4 0.190
If there are groups with all NA, then use "na.or.complete"
(updated data in the comments with groups having only NA)
xx %>%
group_by(group) %>%
summarize(COR=cor(a,b, use = "na.or.complete"))
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA
which returns the same with an if/else
condition and using "complete.obs"
xx %>%
group_by(group) %>%
summarize(COR= if(any(complete.cases(a, b)))
cor(a,b, use = "complete.obs") else NA_real_)
# A tibble: 5 × 2
group COR
<int> <dbl>
1 1 0.0345
2 2 -0.397
3 3 0.150
4 4 0.376
5 5 NA