Home > Blockchain >  Show count of unique values in datasummary and combine two different tables of descriptive statistic
Show count of unique values in datasummary and combine two different tables of descriptive statistic

Time:04-05

I really like the modelsummary package and i'm trying to produce a single table that mixes descriptive statistics of different types. The first part is easy: I can make basic descriptives of var2 and var3 before. I can't get the second part right, though.

  1. I'd like to get a count of the unique entries of the variable var1, i.e. 26.
  2. I'd like to be able to combine the two into one table.
var1<-rep(LETTERS, 5)
var2<-rnorm(length(var1), mean=50, sd=10)
var3<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3)
library(gr)
library(modelsummary)

#This gets the descriptives of var2 and var3
datasummary(var2 var3~Mean SD N, data=df)
#htis returns a long column of the number of entries of each value of var1; I would just like the number 26 here and combine it with the above
datasummary(var1~length, data=df)

CodePudding user response:

Based on add_row (https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#add_rows)

new_row <- data.frame('var1',
                       "-",
                       "-",
                       length(unique((var1))))

datasummary(var2 var3~Mean SD N, data=df, 
            add_rows = new_row)  

CodePudding user response:

Mixing factor and numeric variables in datasummary() is kind of tricky. Here are two options.

The first approach is to create a first table with output="data.frame", and to feed it to the add_rows argument of a second table, inserting “empty” columns as necessary to align the two tables:

library(modelsummary)

var1<-rep(LETTERS[1:5], 5)
var2<-rep(LETTERS[8:12], 5)
var3<-rnorm(length(var1), mean=50, sd=10)
var4<-rnorm(length(var1), mean=10, sd=5)
df<-data.frame(var1, var2, var3, var4)

# function to insert empty columns
empty <- function(...) ""

ar <- datasummary(var1   var2 ~ empty   empty   N,
                  data = df,
                  output = "data.frame")

datasummary(var3   var4 ~ Heading("") * empty   Mean   SD   N,
            data = df,
            add_rows = ar)
Mean SD N
var3 52.66 9.35 25
var4 9.21 5.25 25
var1 A 5
B 5
C 5
D 5
E 5
var2 H 5
I 5
J 5
K 5
L 5

The second approach is to use the datasummary_balance template function with ~1 as a formula argument. This is of course less flexible, but it works for simple cases:

datasummary_balance(~ 1, data = df)
Mean Std. Dev.
var3 52.7 9.4
var4 9.2 5.2
N Pct.
var1 A 5 20.0
B 5 20.0
C 5 20.0
D 5 20.0
E 5 20.0
var2 H 5 20.0
I 5 20.0
J 5 20.0
K 5 20.0
L 5 20.0
  • Related