Home > Mobile >  I want to create a 2X2 table for Chisq test from multiple levels categorical dataset
I want to create a 2X2 table for Chisq test from multiple levels categorical dataset

Time:03-16

I have a dataset of race and outcome either (Y,N) I want to tabulate a 2X2 table to run a chisq test for each race.

  Asian     584   24
  Black    1721   56
  Hispanic 2400   90
  White    8164  289

Once I create a table 2X2 so the first row will be Asian and second row will be non-Asian (counted from values of total - asianNo) and (total- asian yes) as the second column of that row. Then I can run a chisq test easily on each race once I repeat that process for all races. Is there an easier way to run a Chisq test for each race in my table above?

CodePudding user response:

Here is one option with map2, where the first row is an individual race and the second row are the others, then I name each list according to the specific race.

library(tidyverse)

pull(df, V1) %>%
  map2(
    .,
    replicate(nrow(df), df, simplify = FALSE),
    .f = function(x, y)
      y %>%
      filter(V1 != x) %>%
      summarise(across(-V1, sum)) %>%
      bind_rows(filter(y, V1 == x) %>% dplyr::select(-V1), .)
  ) %>%
  set_names(., pull(df, V1))

Output

$Asian
     V2  V3
1   584  24
2 12285 435

$Black
     V2  V3
1  1721  56
2 11148 403

$Hispanic
     V2  V3
1  2400  90
2 10469 369

$White
    V2  V3
1 8164 289
2 4705 170

Data

df <- structure(list(V1 = c("Asian", "Black", "Hispanic", "White"), 
    V2 = c(584L, 1721L, 2400L, 8164L), V3 = c(24L, 56L, 90L, 
    289L)), class = "data.frame", row.names = c(NA, -4L))

CodePudding user response:

Here is another approach. First set up the master table:

tbl <- as.matrix(df[, -1])
Sums <- matrix(colSums(tbl), nrow(tbl), 2, byrow=TRUE)
Tbl <- cbind(tbl, Sums-tbl)
row.names(Tbl) <- df[, 1]
Tbl
#           Yes  No   Yes  No
# Asian     584  24 12285 435
# Black    1721  56 11148 403
# Hispanic 2400  90 10469 369
# White    8164 289  4705 170

Now a function to create 2x2 tables from a row in Tbl:

ChiSqTable <- function(row) {
    matrix(Tbl[row, ], 2, 2, byrow=TRUE, dimnames=list(Race=c(df[row, 1],
         paste("Not", df[row, 1])), Question=c("Yes", "No")))
}

Finally create Chi Square tables and run the test:

Tables <- lapply(seq(nrow(Tbl)), ChiSqTable)
names(Tables) <- df[, 1]
ChiSqStats <- lapply(Tables, chisq.test)
names(ChiSqStats) <- df[, 1]

Tables[[1]]   # or Tables[["Asian"]]
#            Question
# Race          Yes  No
#   Asian       584  24
#   Not Asian 12285 435
ChiSqStats[[1]]
# 
#   Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  X[[i]]
# X-squared = 0.33997, df = 1, p-value = 0.5598

Access the remaining tables, statistical results by specifying the number or Race. All of the results of the Chi Square Test are saved, e.g.

ChiSqStats[[1]]$expected
#            Question
# Race               Yes        No
#   Asian       587.0612  20.93878
#   Not Asian 12281.9388 438.06122
ChiSqStats[[1]]$residuals
#            Question
# Race                Yes         No
#   Asian     -0.12634367  0.6689899
#   Not Asian  0.02762242 -0.1462607
  • Related