Show count of unique values in datasummary and combine two different tables of descriptive statistic-CodePudding

I really like the modelsummary package and i'm trying to produce a single table that mixes descriptive statistics of different types. The first part is easy: I can make basic descriptives of var2 and var3 before. I can't get the second part right, though.

I'd like to get a count of the unique entries of the variable var1, i.e. 26.
I'd like to be able to combine the two into one table.

var1<-rep(LETTERS, 5)
var2<-rnorm(length(var1), mean=50, sd=10)
var3<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3)
library(gr)
library(modelsummary)

#This gets the descriptives of var2 and var3
datasummary(var2 var3~Mean SD N, data=df)
#htis returns a long column of the number of entries of each value of var1; I would just like the number 26 here and combine it with the above
datasummary(var1~length, data=df)

CodePudding user response：

Based on add_row (https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#add_rows)

new_row <- data.frame('var1',
                       "-",
                       "-",
                       length(unique((var1))))

datasummary(var2 var3~Mean SD N, data=df, 
            add_rows = new_row)

CodePudding user response：

Mixing factor and numeric variables in datasummary() is kind of tricky. Here are two options.

The first approach is to create a first table with output="data.frame", and to feed it to the add_rows argument of a second table, inserting “empty” columns as necessary to align the two tables:

library(modelsummary)

var1<-rep(LETTERS[1:5], 5)
var2<-rep(LETTERS[8:12], 5)
var3<-rnorm(length(var1), mean=50, sd=10)
var4<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3, var4)

# function to insert empty columns
empty <- function(...) ""

ar <- datasummary(var1   var2 ~ empty   empty   N,
                  data = df,
                  output = "data.frame")

datasummary(var3   var4 ~ Heading("") * empty   Mean   SD   N,
            data = df,
            add_rows = ar)

		Mean	SD	N
var3		52.66	9.35	25
var4		9.21	5.25	25
var1	A			5
	B			5
	C			5
	D			5
	E			5
var2	H			5
	I			5
	J			5
	K			5
	L			5

The second approach is to use the datasummary_balance template function with ~1 as a formula argument. This is of course less flexible, but it works for simple cases:

datasummary_balance(~ 1, data = df)

		Mean	Std. Dev.
var3		52.7	9.4
var4		9.2	5.2
		N	Pct.
var1	A	5	20.0
	B	5	20.0
	C	5	20.0
	D	5	20.0
	E	5	20.0
var2	H	5	20.0
	I	5	20.0
	J	5	20.0
	K	5	20.0
	L	5	20.0