I'm still new to R and data analytics in general. I have a data set containing 2 parts:
- 20 questions (the answers of which are in 5 point likert scale format)
- 8 socio-demographic variables
Here is a scaled down sample version of the data set (only contains 3 of the 20 questions and 3 socio-demographic variables) in case it is needed:
data.frame(Q1 = c(1, 2, 2, 1, 3, 4, 3, 5, 2, 2),
Q2 = c(2, 3, 5, 5, 4, 5, 1, 1, 5, 3),
Q3 = c(4, 4, 2, 3, 2, 1, 1, 1, 5, 5),
ageRange = c(2, 3, 1, 1, 3, 4, 4, 2, 1, 1),
education = c(1, 1, 3, 4, 6, 5, 3, 2, 1, 4),
maritalStatus = c(1, 0, 0, 0, 0, 1, 1, 0, 0, 1))
- I need to apply a chi square test that relates each question to all the socio demographic variables. That would be a total of 9 chi square results: Q1 - ageRange, Q1 - education, Q1 - maritalStatus, Q2 - ageRange, Q2 - education, Q2 - maritalStatus, Q3 - ageRange, Q3 - education, Q3 - maritalStatus
- I want to arrange the results of the chi square pairings into a data frame or matrix where the columns would be the 3 socio demographic factors and the rows would be the 3 questions. It should look something like this (just replace each 0 with the corresponding p-values for each of the row and column pairs):
data.frame(Age = c(0, 0, 0),
Education = c(0, 0, 0),
Married = c(0, 0, 0), row.names = c("Q1", "Q2", "Q3"))
I tried using some of the apply functions, but I could not get it to work.
CodePudding user response:
We could do something like this. This quite verbose, but for the start it may help:
What we do here is in principle create new data frames with each one of the Q columns and the others. And for each Q we do the same and bind them at the end.
Quite handy is the tidy
function from broom
package:
library(dplyr)
library(tidyr)
library(broom)
Q1 <- df %>%
select(-Q2, -Q3) %>%
pivot_longer(-Q1) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q1, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
Q2 <- df %>%
select(-Q1, -Q3) %>%
pivot_longer(-Q2) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q2, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
Q3 <- df %>%
select(-Q1, -Q2) %>%
pivot_longer(-Q3) %>%
group_by(name) %>%
nest(-name) %>%
mutate(stats = map(data, ~broom::tidy(chisq.test(.$Q3, .$value)))) %>%
select(-data) %>%
unnest(c(stats))
bind_rows(Q1, Q2, Q3, .id = "Q") %>%
mutate(ID = paste0("Q",Q), .before=1, .keep="unused")
ID name statistic p.value parameter method
<chr> <chr> <dbl> <dbl> <int> <chr>
1 Q1 ageRange 15.6 0.209 12 Pearson's Chi-squared test
2 Q1 education 27.5 0.122 20 Pearson's Chi-squared test
3 Q1 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
4 Q2 ageRange 15.6 0.209 12 Pearson's Chi-squared test
5 Q2 education 20.8 0.407 20 Pearson's Chi-squared test
6 Q2 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
7 Q3 ageRange 14.6 0.265 12 Pearson's Chi-squared test
8 Q3 education 21.7 0.359 20 Pearson's Chi-squared test
9 Q3 maritalStatus 3.06 0.549 4 Pearson's Chi-squared test
CodePudding user response:
We may use a loop as well
library(purrr)
library(broom)
library(tidyr)
library(stringr)
library(dplyr)
str_subset(names(df), "^Q\\d $") %>%
map(~ df %>%
select(all_of(.x), ageRange:maritalStatus) %>%
pivot_longer(cols = -1) %>%
group_by(ID = .x, name) %>%
summarise(stats = tidy(chisq.test(cur_data()[[1]], value)),
.groups = "drop")) %>%
list_rbind %>%
unnest(where(is_tibble))
-output
# A tibble: 9 × 6
ID name statistic p.value parameter method
<chr> <chr> <dbl> <dbl> <int> <chr>
1 Q1 ageRange 15.6 0.209 12 Pearson's Chi-squared test
2 Q1 education 27.5 0.122 20 Pearson's Chi-squared test
3 Q1 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
4 Q2 ageRange 15.6 0.209 12 Pearson's Chi-squared test
5 Q2 education 20.8 0.407 20 Pearson's Chi-squared test
6 Q2 maritalStatus 2.71 0.608 4 Pearson's Chi-squared test
7 Q3 ageRange 14.6 0.265 12 Pearson's Chi-squared test
8 Q3 education 21.7 0.359 20 Pearson's Chi-squared test
9 Q3 maritalStatus 3.06 0.549 4 Pearson's Chi-squared test