Home > Enterprise >  R calculate the correlation coefficient
R calculate the correlation coefficient

Time:10-17

I have a data frame with 3 variables "age", "confidence" and countryname". I want to campare the correlation between age and confidence in different countries. So I write the following commands to calcuate the correlation coefficient.

correlate <- evs%>%group_by(countryname) %>% summarise(c=cor(age,confidence))  

But i found that there are a lot missing value in the output "c". i'm wondering is that mean there are little correlation between IV and DV for this countries, or is there something wrong with my commands?

CodePudding user response:

An NA in the correlation matrix means that you have NA values (i.e. missing values) in your observations. The default behaviour of cor is to return a correlation of NA "whenever one of its contributing observations is NA" (from the manual).

That means that a single NA in the date will give a correlation NA even when you only have one NA among a thousand useful data sets.

What you can do from here:

  1. You should investigate these NAs, count it and determine if your data set contains enough usable data. Find out which variables are affected by NAs and to what extent.
  2. Add the argument use when calling cor. This way you specify how the algorithm shall handle missing values. Check out the manual (with ?cor) to find out what options you have. In your case I would just use use="complete.obs". With only 2 variables, most (but not all) options will yield the same result.

Some more explanation:

age <- 18:35
confidence <- (age - 17) / 10   rnorm(length(age))
cor(age, confidence)
#> [1] 0.3589942

Above is the correlation with all the data. Now lets set a few NAs and try again:

confidence[c(1, 6, 11, 16)] <- NA
cor(age, confidence) # use argument will implicitely be "everything".
#> [1] NA

This gives NA because some confidence values are NA. The next statement still gives a result:

cor(age, confidence, use="complete.obs")
#> [1] 0.3130549

Created on 2021-10-16 by the reprex package (v2.0.1)

  •  Tags:  
  • r
  • Related