I am working on a Chi-square analysis in R. I have many subjects, and a previously determined boolean values:
# header > species is_reptile is_animal is_alive
# 1 > lizard yes yes yes
# 2 > snake yes yes yes
# 3 > cat no yes yes
# 4 > flower no no yes
I want to perform a test (I believe a chi square, but I am not sure) to determine how each of these the previous-tests are linked.
I previously used this R code, however it does not seem to work with all the columns as I would like it
chisq.test(data$is_reptile, data$is_animal)
# > Pearson's Chi-squared test with Yates' continuity correction
# > data: data$is_reptile and data$is_animal
# > X-squared = 0, df = 1, p-value = 1
Is there a test (chi_square(data, data$species)
) that can show a table similar to a pearsons?
is_reptile is_animal is_alive
is_reptile 1.0 0.05 0.5
is_animal 0.05 1.0 0.05
is_alive 0.5 0.05 1.0
CodePudding user response:
Something like this?
Reshape the data to long format, table it and run the chi-squared test.
library(dplyr)
df1 %>%
pivot_longer(-1) %>%
select(-1) %>%
table() -> tbl1
tbl1
# value
#name no yes
# is_alive 0 4
# is_animal 1 3
# is_reptile 2 2
chisq.test(tbl1)
#
# Pearson's Chi-squared test
#
#data: tbl1
#X-squared = 2.6667, df = 2, p-value = 0.2636
#
#Warning message:
#In chisq.test(tbl1) : Chi-squared approximation may be incorrect
Data
x <- "species is_reptile is_animal is_alive
lizard yes yes yes
snake yes yes yes
cat no yes yes
flower no no yes"
df1 <- read.table(textConnection(x), header = TRUE)
CodePudding user response:
You may stack
and table
you data before chisq.test
.
chisq.test(table(stack(dat[-1])))
# Pearson's Chi-squared test
#
# data: table(stack(dat[-1]))
# X-squared = 0.68182, df = 2, p-value =
# 0.7111
#
# Warning message:
# In chisq.test(table(stack(dat[-1]))) :
# Chi-squared approximation may be incorrect
Using pipes (same result):
dat[-1] |>
stack() |>
table() |>
chisq.test()
Note: Since you are not sure, if it is the right test for you, perhaps take a look at this related post on Cross Validated.
Data:
dat <- structure(list(species = c("lizard", "snake", "cat", "flower",
"dinosaur"), is_reptile = c("yes", "yes", "no", "no", "yes"),
is_animal = c("yes", "yes", "yes", "no", "yes"), is_alive = c("yes",
"yes", "yes", "yes", "no")), class = "data.frame", row.names = c(NA,
-5L))