Home > Blockchain >  Calculate percentages in skimr::skim_with
Calculate percentages in skimr::skim_with

Time:02-28

I am trying to add percentages of levels of factor to skimr::skim output. I tried to use the table function but it did not work as intended. I can I get the percentages of the different species in the correct format, similar to top_count?

library(skimr)
skim(iris)
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Data summary

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃
my_skim <- skim_with(factor=sfl(pct = ~prop.table(table(.))))
my_skim(iris)
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None

Data summary

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts pct
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333
Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50 0.3333333

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sepal.Length 0 1 5.84 0.83 4.3 5.1 5.80 6.4 7.9 ▆▇▇▅▂
Sepal.Width 0 1 3.06 0.44 2.0 2.8 3.00 3.3 4.4 ▁▆▇▂▁
Petal.Length 0 1 3.76 1.77 1.0 1.6 4.35 5.1 6.9 ▇▁▆▇▂
Petal.Width 0 1 1.20 0.76 0.1 0.3 1.30 1.8 2.5 ▇▁▇▅▃

Created on 2022-02-27 by the reprex package (v2.0.1)

CodePudding user response:

We can paste (str_c) to create a single string

library(skimr)
my_skim <- skim_with(factor=sfl(pct = ~{
     prt <- prop.table(table(.))
     val <- sprintf("%.2f", prt)
     nm1 <- tolower(substr(names(prt), 1, 3))
      stringr::str_c(nm1, val, sep = ": ", collapse = ", ")
      })
)

-testing

> my_skim(iris)
── Data Summary ────────────────────────
                           Values
Name                       iris  
Number of rows             150   
Number of columns          5     
_______________________          
Column type frequency:           
  factor                   1     
  numeric                  4     
________________________         
Group variables            None  

── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate ordered n_unique top_counts                pct                            
1 Species               0             1 FALSE          3 set: 50, ver: 50, vir: 50 set: 0.33, ver: 0.33, vir: 0.33

── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  skim_variable n_missing complete_rate  mean    sd    p0   p25   p50   p75  p100 hist 
1 Sepal.Length          0             1  5.84 0.828   4.3   5.1  5.8    6.4   7.9 ▆▇▇▅▂
2 Sepal.Width           0             1  3.06 0.436   2     2.8  3      3.3   4.4 ▁▆▇▂▁
3 Petal.Length          0             1  3.76 1.77    1     1.6  4.35   5.1   6.9 ▇▁▆▇▂
4 Petal.Width           0             1  1.20 0.762   0.1   0.3  1.3    1.8   2.5 ▇▁▇▅▃
  • Related