Home > other >  Multiple columns in categorical Chi-Square
Multiple columns in categorical Chi-Square

Time:12-31

I am working on a Chi-square analysis in R. I have many subjects, and a previously determined boolean values:

# header > species   is_reptile  is_animal  is_alive
# 1 >      lizard    yes          yes        yes    
# 2 >      snake     yes          yes        yes    
# 3 >      cat       no           yes        yes    
# 4 >      flower    no           no         yes

I want to perform a test (I believe a chi square, but I am not sure) to determine how each of these the previous-tests are linked.

I previously used this R code, however it does not seem to work with all the columns as I would like it

chisq.test(data$is_reptile, data$is_animal)

# > Pearson's Chi-squared test with Yates' continuity correction
# > data:  data$is_reptile and data$is_animal
# > X-squared = 0, df = 1, p-value = 1

Is there a test (chi_square(data, data$species)) that can show a table similar to a pearsons?

            is_reptile    is_animal    is_alive
is_reptile  1.0           0.05         0.5
is_animal   0.05          1.0          0.05
is_alive    0.5           0.05         1.0

CodePudding user response:

Something like this?
Reshape the data to long format, table it and run the chi-squared test.

library(dplyr)

df1 %>%
  pivot_longer(-1) %>%
  select(-1) %>%
  table() -> tbl1

tbl1
#            value
#name         no yes
#  is_alive    0   4
#  is_animal   1   3
#  is_reptile  2   2

chisq.test(tbl1)
#
#   Pearson's Chi-squared test
#
#data:  tbl1
#X-squared = 2.6667, df = 2, p-value = 0.2636
#
#Warning message:  
#In chisq.test(tbl1) : Chi-squared approximation may be incorrect

Data

x <- "species   is_reptile  is_animal  is_alive
lizard    yes          yes        yes    
snake     yes          yes        yes    
cat       no           yes        yes    
flower    no           no         yes"

df1 <- read.table(textConnection(x), header = TRUE)

CodePudding user response:

You may stack and table you data before chisq.test.

chisq.test(table(stack(dat[-1])))
#         Pearson's Chi-squared test
# 
# data:  table(stack(dat[-1]))
# X-squared = 0.68182, df = 2, p-value =
# 0.7111
# 
# Warning message:
# In chisq.test(table(stack(dat[-1]))) :
#   Chi-squared approximation may be incorrect

Using pipes (same result):

dat[-1] |>
  stack() |>
  table() |>
  chisq.test()

Note: Since you are not sure, if it is the right test for you, perhaps take a look at this related post on Cross Validated.


Data:

dat <- structure(list(species = c("lizard", "snake", "cat", "flower", 
"dinosaur"), is_reptile = c("yes", "yes", "no", "no", "yes"), 
    is_animal = c("yes", "yes", "yes", "no", "yes"), is_alive = c("yes", 
    "yes", "yes", "yes", "no")), class = "data.frame", row.names = c(NA, 
-5L))
  • Related