I would like to perform a chi-square test in R using dpylr. Specifically, I would like to investigate whether there is a difference in customer churn between male and female customers. Here a short example of my data.
sex churn
<fct> <lgl>
1 W FALSE
2 W FALSE
3 W FALSE
4 W FALSE
5 W FALSE
6 W FALSE
7 W FALSE
8 W FALSE
9 W FALSE
10 W FALSE
11 W FALSE
12 W FALSE
13 M FALSE
14 W FALSE
15 W FALSE
16 W FALSE
17 W FALSE
18 M FALSE
19 W FALSE
20 W TRUE
21 W TRUE
22 M FALSE
23 M FALSE
24 W TRUE
25 W FALSE
With the summarise and spread function I already get a nice summary table.
churn_latest %>%
group_by(sex, churn) %>%
summarise(n = n()) %>%
spread(key = sex, value = n)
Now I would like to apply a chi-square test to it, but I always get the following error: 'x' and 'y' must have at least 2 levels. This is of course the case for me, so I must have an error in the syntax.
churn_latest %>%
group_by(sex, churn) %>%
summarise(chi = chisq.test(sex, churn))
I would be very happy if someone had a solution to my problem. Many thanks in advance!
CodePudding user response:
You’ll first need to produce a contingency table from your data, which you can then pass to chisq.test
. To produce the contingency table using ‘dplyr’ & ‘tidyr’ you can use
churn_latest %>%
count(sex, churn) %>%
pivot_wider(names_from = sex, values_from = n, values_fill = 0L)
# A tibble: 2 × 3 churn M W <lgl> <int> <int> 1 FALSE 4 18 2 TRUE 0 3
Next, you need to convert this into a matrix (dropping the key column):
… %>%
select(-churn) %>%
as.matrix()
M W [1,] 4 18 [2,] 0 3
And that, finally, can be passed to chisq.test
. Putting it all together:
churn_latest %>%
count(sex, churn) %>%
pivot_wider(names_from = sex, values_from = n, values_fill = 0L) %>%
select(-churn) %>%
as.matrix() %>%
chisq.test()
… to be fair, using ‘dpyr’ and ‘tidyr’ here is a bit overkill. Base R table
does the same much more concisely:
churn_latest %>%
table() %>%
chisq.test()