error with function to include in datasummary-CodePudding

I am trying to create a table with factor and numeric variables using modelsummary. The way I am doing this is by converting factor variables to numeric so that only 1 line appears for each factor variable and all variables appear in the same column. Then, I will manually calculate the number of units for each level of each previously factor/now numeric variable and assign this as text to each variable in my dataset. I am trying to do this as per the function called N_alt in the example below:

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}


# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg   (`class [0,1]`= class)   (`region [A,B,C]`= region)   hp ~ Heading("N (%)") * N_alt, data = tmp)

which gives me:

My N_alt function does not work properly. class is correct, but region is not. I am not getting any warning messages.

I have also tried:

N_alt = function(x) {
  if (x[1] %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x[1] %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

but I obtained the same output. I have created similar functions with these vectors and they worked fine, but this one for some reason it is not working.

Additionally, I also tried:

N_alt <- c('[32 (100)]','[14 (43.8); 18 (56.3)]','[14 (43.8); 6 (18.8); 12 (37.5)]','[32 (100)]')

and

N_alt <- c(rep('[32 (100)]',32),rep('[14 (43.8); 18 (56.3)]',32),rep('[14 (43.8); 6 (18.8); 12 (37.5)]',32),rep('[32 (100)]',32))

but I get:

Error in datasummary(mpg   (`class [0,1]` = class)   (`region [A,B,C]` = region)    : 
  Argument 'N_alt' is not length 32

Does anyone know what I am missing here?

Edit:

It seems to be possible to run functions just as the below Mean_alt so that certain numeric variables do not have decimal places (just converting them to as.integer did not work for me) and previously factor/now numeric variables do not show any results for Mean in the table (two different actions), as per the below:

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

Mean_alt = function(x) {
  if (x %in% c(tmp$mpg)) {
    as.character(floor(mean(x)), length=5)
  } else if (x %in% c(tmp$class, tmp$region)) {
    paste0("")
  } else {
    mean(x)
  }
}

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg   (`class [0,1]`= class)   (`region [A,B,C]`= region)   hp ~ Heading("N (%)") * N_alt   Heading("Mean") * Mean_alt, data = tmp)

output:

CodePudding user response：

You are running against three limitations.

The first limitation is in Base R:

As explained in the R manual, the statements in an if/else must evaluate to a single TRUE or FALSE. Internally, datasummary will apply the N_alt to each variable one after the other. Each time, N_alt receives a new vector of length 32. Frankly, I don’t think it makes much sense to check the value of the first element of that vector; I don’t see how this can get us where we want to go.

The two other limitations have to do with the fundamental design of the tables package, on which modelsummary::datasummary is based:

Factors will always generate one row per factor level.
I don’t think there is a good way to tell datasummary that a function should behave differently when applied to different numeric variables. This is because each function only sees the raw numeric vector, and not other meta-information.

I think the easiest workaround is to create two tables, one for your factors and one for your numeric. Then, these tables can easily be combined:

library(modelsummary)

N_factor <- function(x) {
  count <- table(x)
  pct <- prop.table(count)
  out <- paste(sprintf("%.0f (%.1f)", count, pct), collapse = "; ")
  sprintf("[%s]", out)
}

N_numeric <- function(x) {
  sprintf("%s (100)", length(x))
}

tab_fac <- datasummary(cyl   gear ~ Heading("N") * N_factor, 
                       output = "data.frame",
                       data = mtcars)

datasummary(mpg   hp ~ Heading("N") * N_numeric, 
            add_rows = tab_fac,
            data = mtcars)

	N
mpg	32 (100)
hp	32 (100)
cyl	[11 (0.3); 7 (0.2); 14 (0.4)]
gear	[15 (0.5); 12 (0.4); 5 (0.2)]