How do I generate a series of tables displaying counts for each unique pairing of a set of binary nu-CodePudding

I'm trying to get R to generate a series of tables that show me the distribution of values between each unique pairing of a set of 10 binary variables (values are 0, 1 or NA). Then, I want to run a series of chi-square tests of independence using those tables. I could just run individual table and chi-square commands -

TAB1_2 = table(var1, var2)
CHI1_2 = chisq.test(TAB1_2, correct = TRUE)

TAB1_3 = table(var1, var3)
CHI1_3 = chisq.test(TAB1_3, correct = TRUE)

TAB1_4 = table(var1, var4)
CHI1_4 = chisq.test(TAB1_4, correct = TRUE)

and so on, but it's tedious. Is there a way I can run some kind of loop to do this?

Here's a fictitious dataset that is similar in structure to the one I'm using:

data = structure(list(var1 = c(0, 1, 0, 1, 1, 0, 0, 1, 0, 1), var2 = c(1, 
0, 0, NA, 1, 1, 1, 0, 1, 0), var3 = c(1, 0, 0, 1, 0, 1, 1, 0, 
0, 1), var4 = c(1, 0, 1, 0, 1, 1, 1, 1, 1, 1), var5 = c(1, 0, 
1, 1, 1, 1, 0, 0, 1, 0), var6 = c(0, 1, 0, 0, NA, 0, 1, 0, 0, 
1), var7 = c(1, 1, 0, 1, 0, 0, 1, 0, 1, 0), var8 = c(1, 1, 0, 
1, 0, 0, 1, 1, 0, 0), var9 = c(0, 1, 1, 0, 0, 0, 1, 0, 0, 0), 
    var10 = c(1, 1, 0, 0, 1, 0, 0, 0, NA, 1)), row.names = c(NA, 
10L), class = "data.frame")

Help would be much appreciated!

CodePudding user response：

You can use lapply() to "loop" through all columns. The result would be a list of length = ncol(data).

lapply(data, function(x) chisq.test(x = data$var1, y = x, correct = T))

Output

The second variable name would be the names in the list. Note that the first entry is var1 against var1.

$var1

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 6.4, df = 1, p-value = 0.01141


$var2

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0.95063, df = 1, p-value = 0.3296


$var3

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0, df = 1, p-value = 1


$var4

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0.625, df = 1, p-value = 0.4292


$var5

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0.41667, df = 1, p-value = 0.5186


$var6

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0.05625, df = 1, p-value = 0.8125


$var7

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0, df = 1, p-value = 1


$var8

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0, df = 1, p-value = 1


$var9

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0, df = 1, p-value = 1


$var10

    Pearson's Chi-squared test with Yates' continuity correction

data:  data$var1 and x
X-squared = 0.14062, df = 1, p-value = 0.7077

CodePudding user response：

We can use combn to get all unique two column combinations (by column indices). Then, we can use map2 to pass the two column indices to a function to create a table. This will create a list of tables. Then, we can apply the chisq.test to each table. Then, I use set_names to assign the names to each list item.

library(tidyverse)

combos <- as.data.frame(t(combn(c(1:ncol(data)), 2)))

table.list <-
  map2(combos$V1, combos$V2, ~  table(data[, c(.x, .y)])) %>%
  map(., ~ chisq.test(.x, correct = TRUE)) %>%
  set_names(., paste0("TAB", combos$V1, "_", combos$V2))

Or using base R:

setNames(lapply(
  mapply(
    function(X, Y)
      table(data[[X]], data[[Y]]),
    X = combos$V1,
    Y = combos$V2,
    SIMPLIFY = FALSE
  ),
  \(x) chisq.test(x, correct = TRUE)
),
paste0("TAB", combos$V1, "_", combos$V2))

Output

head(table.list)

$TAB1_2

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0.95063, df = 1, p-value = 0.3296


$TAB1_3

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0, df = 1, p-value = 1


$TAB1_4

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0.625, df = 1, p-value = 0.4292


$TAB1_5

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0.41667, df = 1, p-value = 0.5186


$TAB1_6

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0.05625, df = 1, p-value = 0.8125


$TAB1_7

    Pearson's Chi-squared test with Yates' continuity correction

data:  .x
X-squared = 0, df = 1, p-value = 1