Function that tabulates for specific values and returns counts-CodePudding

Imagine you have a vector x:

x <- c("C", "A", "B", "B", "A", "D", "B", "B", "A", "A", "A", "A", "A", "D", "C", "A", "C", "A", "A", "C", "A", "A", "D", "A", "D", "A", "D", "A", "A", "D", "D", "B", "B", "A", "A", "C", "A", "A", "B", "B", "B", "B", "B", "B", "B", "A", "C", "A", "C", "B")

You can make a table using:

table(x)
# x
#  A  B  C  D 
# 22 14  7  7

What if you only want the table to include certain values (eg. 'A' and 'B'), or you want the table to include values that might not exist in x?

This is my attempt:

tab_specific_values <- function(vector, values) `names<-`(rowSums(outer(values, vector, `==`)), values)

For example:

tab_specific_values(vector = x, values = c('A', 'B'))
# A  B 
# 22 14

Or if we specify a value that does not exist in x

tab_specific_values(vector = x, values = c('A', 'B', 'E'))
# A  B  E 
# 22 14  0

Is there an existing dedicated function that does this, or do you have a better approach? I suspect my function tab_specific_values might not be the best approach.

CodePudding user response：

Convert to factor with certain levels, then table:

#my values
v <- c("A", "B", "E")

table(factor(x, levels = v))
#  A  B  E 
# 22 14  0

CodePudding user response：

Benchmarking:

microbenchmark(
  a = table(x, exclude = c('A', 'B')),
  b = table(factor(x, levels = c('C', 'D'))),
  c = tab_specific_values(vector = x, values = c('C', 'D')),
  times = 1000
)

Unit: microseconds
 expr     min       lq      mean  median       uq       max neval
    a 116.401 131.6505 177.20030 145.201 236.8010   604.701  1000
    b  49.302  60.0010  92.33422  66.501 109.4510 10974.101  1000
    c  13.301  20.1005  29.09018  24.201  36.3015   134.901  1000

When x is 1,000,000 long:

Unit: milliseconds
 expr      min        lq      mean    median        uq      max neval
    a 119.3651 131.24110 142.63383 137.50385 144.07945 233.1265   100
    b  43.9441  48.18640  58.24316  54.75485  59.12390 129.5087   100
    c  48.9598  55.33825  67.03932  62.64145  65.93755 152.9490   100