Home > Net >  Counting group size including zero with R's data.table
Counting group size including zero with R's data.table

Time:11-07

I have a small (< 10M row) data.table with a variable that takes on integer values. I would like to generate a count of the number of the number of times that the variable takes on each integer value, including zeroes when the variable never takes on that value.

For example, I might have:

dt <- data.table(a = c(1,1,3,3,5,5,5))

My desired output is a data.table with values:

a N
1 2
2 0
3 2
4 0
5 3

This is an extremely basic question, but it is difficult to find data.table specific answers for it. In my example, we can assume that the minimum is always 0, but the maximum variable value is unknown.

CodePudding user response:

dt[, .N, by = .(a)
  ][data.table(a = seq(min(dt$a), max(dt$a))), on = .(a)
  ][is.na(N), N := 0][]
#        a     N
#    <int> <int>
# 1:     1     2
# 2:     2     0
# 3:     3     2
# 4:     4     0
# 5:     5     3
  • Related