Home > Net >  Running a spearman test with a few columns of values in a csv that have #DIV!. It works! But is it s
Running a spearman test with a few columns of values in a csv that have #DIV!. It works! But is it s

Time:07-23

Everything is running fine, but I'm checking to make sure that non-numeric values don't totally screw up the test. I turned the variables into numeric using as.numeric and it returned "introduced by coercion" - but it worked!

I'm running this line of code with a file of 2020 Presidential election data by county and Unemployment data.

cor.test(Unemployment2020, PercentD2020, method = 'spearman', exact = FALSE).

Does the "exact = FALSE" piece make it unnecessary for there to be the same number of numeric values for each variable?

CodePudding user response:

This has nothing to do with exact = FALSE.

Since cor.test is an S3 generic, when you pass two numeric vectors to it, you will invoke the stats:::cor.test.default method. Reviewing the source code of this function, you will see that it silently drops the NA values in lines 10 to 13 of the function body:

    OK <- complete.cases(x, y)
    x <- x[OK]
    y <- y[OK]
    n <- length(x)

The complete.cases(x, y) here will drop NA values from both vectors, so that only matching entries where neither are NA will be considered.

We can see this in action with the following example. Suppose we have an x and a y vector and want to run cor.test, but each has an NA value at a different point:

x <- c(1, 2, NA, 3, 4, 5)
y <- c(1.1, 1.9, 7, 3.3, 4.5, NA)

cor.test(x, y)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  x and y
#> t = 13.671, df = 2, p-value = 0.005308
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.7634907 0.9998944
#> sample estimates:
#>       cor 
#> 0.9946918

We should get the same result if we drop the third entry from each vector (since x has an NA there) and drop the 6th entry where y has an NA:

x <- c(1, 2, 3, 4)
y <- c(1.1, 1.9, 3.3, 4.5)

cor.test(x, y)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  x and y
#> t = 13.671, df = 2, p-value = 0.005308
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.7634907 0.9998944
#> sample estimates:
#>       cor 
#> 0.9946918

Created on 2022-07-22 by the reprex package (v2.0.1)

  • Related