I am currently trying to display the count of factor levels (e.g., gender) and their relative frequency per group (e.g., treatment group) using datasummary. In addition, I would like to combine this with the display of quantitative variables (e.g., age) with their respective mean and standard deviation.
So far, I created a function to display mean and sd in one column and managed to calculate N and percentages. However, I am struggling with creating a function that displays N and percentage in one column as well as adding the empty column to the datasummary of the quantitative variable to combine both frames (based on Show count of unique values in datasummary and combine two different tables of descriptive statistics using data).
library(modelsummary)
library(magrittr)
library(dplyr)
set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T),
labels = c("Male", "Female", "Other"))
iris$job <- factor(sample(1:5, size = 150, replace = T),
labels = c("Student", "Worker", "CEO", "Other", "None"))
empty <- function(...) ""
MeanSD = function(x) {
M = mean(x, na.rm = T)
SD = sd(x, na.rm = T)
MSD = paste(round(M, 2), " (",round(SD,2), ")", sep = "")
return(MSD)
}
#This function does not work properly
NP = function(x, y) {
N = N(x)
P = Percent(x, y, denom = "col")
out = paste(N, " (",P, ")", sep = "")
return(NP)
}
iris_tab1 <- iris %>% dplyr::select(Species,
Gender = gender,
Job = job,
Length = Sepal.Length)
tbl_1 <- datasummary((Heading("")*N Heading("")*Percent(fn = function(x, y) 100 * length(x) / length(y), denom = "col"))*(Gender Job)~Species,
data = iris_tab1,
fmt = 2,
output = 'data.frame'
)
tbl_1
#Cannot add the empty column
tbl_2 <- datasummary(Heading("")*(MeanSD)*Length~empty Species,
data = iris_tab1,
output = 'data.frame'
)
tbl_2
CodePudding user response:
empty
is a function. MeanSD
is a function. All functions need to go on the same side of the datasummary
formula:
library(modelsummary)
library(magrittr)
library(dplyr)
set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T),
labels = c("Male", "Female", "Other"))
iris$job <- factor(sample(1:5, size = 150, replace = T),
labels = c("Student", "Worker", "CEO", "Other", "None"))
empty <- function(...) ""
MeanSD = function(x) {
M = mean(x, na.rm = T)
SD = sd(x, na.rm = T)
MSD = paste(round(M, 2), " (", round(SD, 2), ")", sep = "")
return(MSD)
}
iris_tab1 <- iris %>%
dplyr::select(Species,
Gender = gender,
Job = job,
Length = Sepal.Length)
tbl_2 <- datasummary(Heading("") * Length ~ empty MeanSD * Species,
data = iris_tab1,
output = "data.frame")
tbl_2
#> empty setosa versicolor virginica
#> 1 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
Simple illustration of Percent function:
library(modelsummary)
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
fn <- function(x, y) {
out <- sprintf(
"%s (%.1f%%)",
length(x),
length(x) / length(y) * 100)
}
datasummary(
cyl ~ Percent(fn = fn),
data = dat)
cyl | Percent |
---|---|
4 | 11 (34.4%) |
6 | 7 (21.9%) |
8 | 14 (43.8%) |