I have the following data in R:
gender <- c("Male","Female")
gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))
gender <- as.factor(gender)
disease <- c("Yes","No")
disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))
disease <- as.factor(disease)
status <- c("Immigrant","Citizen")
status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))
status <- as.factor(status )
my_data = data.frame(gender, status, disease)
I want to make a table that shows:
- What percent of male immigrants have the disease?
- What percent of male non-immigrants have the disease?
- What percent of female immigrants have the disease?
- What percent of female non-immigrants have the disease?
I tried to do this with the following code:
t1 <- xtabs(disease ~ gender status, data=my_data)
But I get this error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘sum’ not meaningful for factors
Can someone please show me what I am doing wrong and how to fix this?
Thank you!
CodePudding user response:
As there are more columns and all of them are factor
s, use count
from dplyr
and then get the proportions
library(dplyr)
library(tidyr)
my_data %>%
dplyr::count(across(everything())) %>%
pivot_wider(names_from = disease, values_from =n, values_fill = 0) %>%
group_by(gender) %>%
mutate(100 *across(No:Yes, proportions)) %>%
ungroup
-output
# A tibble: 4 × 4
gender status No Yes
<fct> <fct> <dbl> <dbl>
1 Female Citizen 69.4 72.4
2 Female Immigrant 30.6 27.6
3 Male Citizen 70.4 68.7
4 Male Immigrant 29.6 31.3
With xtabs
, if we convert the column to integer
, it could work as
apply(xtabs(n ~ disease gender status,
transform(my_data, n = as.integer(disease))), c(1, 2), proportions) * 100
, , gender = Female
disease
status No Yes
Citizen 69.36724 72.41993
Immigrant 30.63276 27.58007
, , gender = Male
disease
status No Yes
Citizen 70.40185 68.68687
Immigrant 29.59815 31.31313