Imagine you have a vector x
:
x <- c("C", "A", "B", "B", "A", "D", "B", "B", "A", "A", "A", "A", "A", "D", "C", "A", "C", "A", "A", "C", "A", "A", "D", "A", "D", "A", "D", "A", "A", "D", "D", "B", "B", "A", "A", "C", "A", "A", "B", "B", "B", "B", "B", "B", "B", "A", "C", "A", "C", "B")
You can make a table using:
table(x)
# x
# A B C D
# 22 14 7 7
What if you only want the table to include certain values (eg. 'A' and 'B'), or you want the table to include values that might not exist in x
?
This is my attempt:
tab_specific_values <- function(vector, values) `names<-`(rowSums(outer(values, vector, `==`)), values)
For example:
tab_specific_values(vector = x, values = c('A', 'B'))
# A B
# 22 14
Or if we specify a value that does not exist in x
tab_specific_values(vector = x, values = c('A', 'B', 'E'))
# A B E
# 22 14 0
Is there an existing dedicated function that does this, or do you have a better approach? I suspect my function tab_specific_values
might not be the best approach.
CodePudding user response:
Convert to factor with certain levels, then table:
#my values
v <- c("A", "B", "E")
table(factor(x, levels = v))
# A B E
# 22 14 0
CodePudding user response:
Benchmarking:
microbenchmark(
a = table(x, exclude = c('A', 'B')),
b = table(factor(x, levels = c('C', 'D'))),
c = tab_specific_values(vector = x, values = c('C', 'D')),
times = 1000
)
Unit: microseconds
expr min lq mean median uq max neval
a 116.401 131.6505 177.20030 145.201 236.8010 604.701 1000
b 49.302 60.0010 92.33422 66.501 109.4510 10974.101 1000
c 13.301 20.1005 29.09018 24.201 36.3015 134.901 1000
When x is 1,000,000 long:
Unit: milliseconds
expr min lq mean median uq max neval
a 119.3651 131.24110 142.63383 137.50385 144.07945 233.1265 100
b 43.9441 48.18640 58.24316 54.75485 59.12390 129.5087 100
c 48.9598 55.33825 67.03932 62.64145 65.93755 152.9490 100