I really like the modelsummary
package and i'm trying to produce a single table that mixes descriptive statistics of different types. The first part is easy: I can make basic descriptives of var2
and var3
before. I can't get the second part right, though.
- I'd like to get a count of the unique entries of the variable
var1
, i.e. 26. - I'd like to be able to combine the two into one table.
var1<-rep(LETTERS, 5)
var2<-rnorm(length(var1), mean=50, sd=10)
var3<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3)
library(gr)
library(modelsummary)
#This gets the descriptives of var2 and var3
datasummary(var2 var3~Mean SD N, data=df)
#htis returns a long column of the number of entries of each value of var1; I would just like the number 26 here and combine it with the above
datasummary(var1~length, data=df)
CodePudding user response:
Based on add_row
(https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#add_rows)
new_row <- data.frame('var1',
"-",
"-",
length(unique((var1))))
datasummary(var2 var3~Mean SD N, data=df,
add_rows = new_row)
CodePudding user response:
Mixing factor and numeric variables in datasummary()
is kind of tricky. Here are two options.
The first approach is to create a first table with output="data.frame"
, and to feed it to the add_rows
argument of a second table, inserting “empty” columns as necessary to align the two tables:
library(modelsummary)
var1<-rep(LETTERS[1:5], 5)
var2<-rep(LETTERS[8:12], 5)
var3<-rnorm(length(var1), mean=50, sd=10)
var4<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3, var4)
# function to insert empty columns
empty <- function(...) ""
ar <- datasummary(var1 var2 ~ empty empty N,
data = df,
output = "data.frame")
datasummary(var3 var4 ~ Heading("") * empty Mean SD N,
data = df,
add_rows = ar)
Mean | SD | N | ||
---|---|---|---|---|
var3 | 52.66 | 9.35 | 25 | |
var4 | 9.21 | 5.25 | 25 | |
var1 | A | 5 | ||
B | 5 | |||
C | 5 | |||
D | 5 | |||
E | 5 | |||
var2 | H | 5 | ||
I | 5 | |||
J | 5 | |||
K | 5 | |||
L | 5 |
The second approach is to use the datasummary_balance
template function with ~1
as a formula argument. This is of course less flexible, but it works for simple cases:
datasummary_balance(~ 1, data = df)
Mean | Std. Dev. | ||
---|---|---|---|
var3 | 52.7 | 9.4 | |
var4 | 9.2 | 5.2 | |
N | Pct. | ||
var1 | A | 5 | 20.0 |
B | 5 | 20.0 | |
C | 5 | 20.0 | |
D | 5 | 20.0 | |
E | 5 | 20.0 | |
var2 | H | 5 | 20.0 |
I | 5 | 20.0 | |
J | 5 | 20.0 | |
K | 5 | 20.0 | |
L | 5 | 20.0 |