I'm trying to get R to generate a series of tables that show me the distribution of values between each unique pairing of a set of 10 binary variables (values are 0, 1 or NA). Then, I want to run a series of chi-square tests of independence using those tables. I could just run individual table and chi-square commands -
TAB1_2 = table(var1, var2)
CHI1_2 = chisq.test(TAB1_2, correct = TRUE)
TAB1_3 = table(var1, var3)
CHI1_3 = chisq.test(TAB1_3, correct = TRUE)
TAB1_4 = table(var1, var4)
CHI1_4 = chisq.test(TAB1_4, correct = TRUE)
and so on, but it's tedious. Is there a way I can run some kind of loop to do this?
Here's a fictitious dataset that is similar in structure to the one I'm using:
data = structure(list(var1 = c(0, 1, 0, 1, 1, 0, 0, 1, 0, 1), var2 = c(1,
0, 0, NA, 1, 1, 1, 0, 1, 0), var3 = c(1, 0, 0, 1, 0, 1, 1, 0,
0, 1), var4 = c(1, 0, 1, 0, 1, 1, 1, 1, 1, 1), var5 = c(1, 0,
1, 1, 1, 1, 0, 0, 1, 0), var6 = c(0, 1, 0, 0, NA, 0, 1, 0, 0,
1), var7 = c(1, 1, 0, 1, 0, 0, 1, 0, 1, 0), var8 = c(1, 1, 0,
1, 0, 0, 1, 1, 0, 0), var9 = c(0, 1, 1, 0, 0, 0, 1, 0, 0, 0),
var10 = c(1, 1, 0, 0, 1, 0, 0, 0, NA, 1)), row.names = c(NA,
10L), class = "data.frame")
Help would be much appreciated!
CodePudding user response:
You can use lapply()
to "loop" through all columns. The result would be a list of length = ncol(data)
.
lapply(data, function(x) chisq.test(x = data$var1, y = x, correct = T))
Output
The second variable name would be the names in the list. Note that the first entry is var1
against var1
.
$var1
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 6.4, df = 1, p-value = 0.01141
$var2
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0.95063, df = 1, p-value = 0.3296
$var3
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0, df = 1, p-value = 1
$var4
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0.625, df = 1, p-value = 0.4292
$var5
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0.41667, df = 1, p-value = 0.5186
$var6
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0.05625, df = 1, p-value = 0.8125
$var7
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0, df = 1, p-value = 1
$var8
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0, df = 1, p-value = 1
$var9
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0, df = 1, p-value = 1
$var10
Pearson's Chi-squared test with Yates' continuity correction
data: data$var1 and x
X-squared = 0.14062, df = 1, p-value = 0.7077
CodePudding user response:
We can use combn
to get all unique two column combinations (by column indices). Then, we can use map2
to pass the two column indices to a function to create a table
. This will create a list of tables. Then, we can apply the chisq.test
to each table. Then, I use set_names
to assign the names to each list item.
library(tidyverse)
combos <- as.data.frame(t(combn(c(1:ncol(data)), 2)))
table.list <-
map2(combos$V1, combos$V2, ~ table(data[, c(.x, .y)])) %>%
map(., ~ chisq.test(.x, correct = TRUE)) %>%
set_names(., paste0("TAB", combos$V1, "_", combos$V2))
Or using base R:
setNames(lapply(
mapply(
function(X, Y)
table(data[[X]], data[[Y]]),
X = combos$V1,
Y = combos$V2,
SIMPLIFY = FALSE
),
\(x) chisq.test(x, correct = TRUE)
),
paste0("TAB", combos$V1, "_", combos$V2))
Output
head(table.list)
$TAB1_2
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0.95063, df = 1, p-value = 0.3296
$TAB1_3
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0, df = 1, p-value = 1
$TAB1_4
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0.625, df = 1, p-value = 0.4292
$TAB1_5
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0.41667, df = 1, p-value = 0.5186
$TAB1_6
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0.05625, df = 1, p-value = 0.8125
$TAB1_7
Pearson's Chi-squared test with Yates' continuity correction
data: .x
X-squared = 0, df = 1, p-value = 1