I would like my "count by group" to take into account NA values from a specified c
library(data.table)
sample <- fread("
1,0,2,2,cat X
3,4,3,NA,cat Y
1,0,2,2,cat X
3,4,3,0,cat Y
")
names(sample) <- c("A","B","C", "D", "cat")
sample <- sample[,observations:= .N, by=cat]
A B C D cat observations
1: 1 0 2 2 cat X 2
2: 3 4 3 NA cat Y 2
3: 1 0 2 2 cat X 2
4: 3 4 3 0 cat Y 2
I would like be able to specify that I want to count the available observations with respect to column D
. Because there is an NA
for cat Y
, with respect to D
it has only one observation.
Desired output:
A B C D cat observations
1: 1 0 2 2 cat X 2
2: 3 4 3 NA cat Y 1
3: 1 0 2 2 cat X 2
4: 3 4 3 0 cat Y 1
Is there a way to specify this?
CodePudding user response:
.N
behaves irrespective of a column (like dplyr::n
), so you can sum
the non-NA
values of column D
:
sample[, observations := sum(!is.na(D)), by = cat][]
A B C D cat observations
1: 1 0 2 2 cat X 2
2: 3 4 3 NA cat Y 1
3: 1 0 2 2 cat X 2
4: 3 4 3 0 cat Y 1