I am trying to create a table with factor and numeric variables using modelsummary. The way I am doing this is by converting factor variables to numeric so that only 1 line appears for each factor variable and all variables appear in the same column. Then, I will manually calculate the number of units for each level of each previously factor/now numeric variable and assign this as text to each variable in my dataset. I am trying to do this as per the function called N_alt
in the example below:
library(modelsummary)
library(kableExtra)
tmp <- mtcars[, c("mpg", "hp")]
tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)
tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)
tmp$class <- 0
tmp$region <- 0
N_alt = function(x) {
if (x %in% c(tmp$class)) {
paste0('[14 (43.8); 18 (56.3)]')
} else if (x %in% c(tmp$region)) {
paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')
} else {
paste0('[32 (100)]')
}
}
# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg (`class [0,1]`= class) (`region [A,B,C]`= region) hp ~ Heading("N (%)") * N_alt, data = tmp)
My N_alt
function does not work properly. class
is correct, but region
is not. I am not getting any warning messages.
I have also tried:
N_alt = function(x) {
if (x[1] %in% c(tmp$class)) {
paste0('[14 (43.8); 18 (56.3)]')
} else if (x[1] %in% c(tmp$region)) {
paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')
} else {
paste0('[32 (100)]')
}
}
but I obtained the same output. I have created similar functions with these vectors and they worked fine, but this one for some reason it is not working.
Additionally, I also tried:
N_alt <- c('[32 (100)]','[14 (43.8); 18 (56.3)]','[14 (43.8); 6 (18.8); 12 (37.5)]','[32 (100)]')
and
N_alt <- c(rep('[32 (100)]',32),rep('[14 (43.8); 18 (56.3)]',32),rep('[14 (43.8); 6 (18.8); 12 (37.5)]',32),rep('[32 (100)]',32))
but I get:
Error in datasummary(mpg (`class [0,1]` = class) (`region [A,B,C]` = region) :
Argument 'N_alt' is not length 32
Does anyone know what I am missing here?
Edit:
It seems to be possible to run functions just as the below Mean_alt
so that certain numeric variables do not have decimal places (just converting them to as.integer did not work for me) and previously factor/now numeric variables do not show any results for Mean in the table (two different actions), as per the below:
library(modelsummary)
library(kableExtra)
tmp <- mtcars[, c("mpg", "hp")]
tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)
tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)
tmp$class <- 0
tmp$region <- 0
N_alt = function(x) {
if (x %in% c(tmp$class)) {
paste0('[14 (43.8); 18 (56.3)]')
} else if (x %in% c(tmp$region)) {
paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')
} else {
paste0('[32 (100)]')
}
}
Mean_alt = function(x) {
if (x %in% c(tmp$mpg)) {
as.character(floor(mean(x)), length=5)
} else if (x %in% c(tmp$class, tmp$region)) {
paste0("")
} else {
mean(x)
}
}
# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg (`class [0,1]`= class) (`region [A,B,C]`= region) hp ~ Heading("N (%)") * N_alt Heading("Mean") * Mean_alt, data = tmp)
CodePudding user response:
You are running against three limitations.
The first limitation is in Base R
:
- As explained in the
R
manual, the statements in anif
/else
must evaluate to a singleTRUE
orFALSE
. Internally,datasummary
will apply theN_alt
to each variable one after the other. Each time,N_alt
receives a new vector of length 32. Frankly, I don’t think it makes much sense to check the value of the first element of that vector; I don’t see how this can get us where we want to go.
The two other limitations have to do with the fundamental design of the tables
package, on which modelsummary::datasummary
is based:
- Factors will always generate one row per factor level.
- I don’t think there is a good way to tell
datasummary
that a function should behave differently when applied to different numeric variables. This is because each function only sees the raw numeric vector, and not other meta-information.
I think the easiest workaround is to create two tables, one for your factors and one for your numeric. Then, these tables can easily be combined:
library(modelsummary)
N_factor <- function(x) {
count <- table(x)
pct <- prop.table(count)
out <- paste(sprintf("%.0f (%.1f)", count, pct), collapse = "; ")
sprintf("[%s]", out)
}
N_numeric <- function(x) {
sprintf("%s (100)", length(x))
}
tab_fac <- datasummary(cyl gear ~ Heading("N") * N_factor,
output = "data.frame",
data = mtcars)
datasummary(mpg hp ~ Heading("N") * N_numeric,
add_rows = tab_fac,
data = mtcars)
N | |
---|---|
mpg | 32 (100) |
hp | 32 (100) |
cyl | [11 (0.3); 7 (0.2); 14 (0.4)] |
gear | [15 (0.5); 12 (0.4); 5 (0.2)] |