Home > Blockchain >  Counting by group, with respect to a specific column, keeping into account the amount of NA's
Counting by group, with respect to a specific column, keeping into account the amount of NA's

Time:09-29

I would like my "count by group" to take into account NA values from a specified c

library(data.table)
sample <- fread("
1,0,2,2,cat X
3,4,3,NA,cat Y
1,0,2,2,cat X
3,4,3,0,cat Y
")
names(sample) <- c("A","B","C", "D", "cat")

sample <- sample[,observations:= .N, by=cat]

   A B C  D   cat observations
1: 1 0 2  2 cat X           2
2: 3 4 3 NA cat Y           2
3: 1 0 2  2 cat X           2
4: 3 4 3  0 cat Y           2

I would like be able to specify that I want to count the available observations with respect to column D. Because there is an NA for cat Y, with respect to D it has only one observation.

Desired output:

   A B C  D   cat observations
1: 1 0 2  2 cat X           2
2: 3 4 3 NA cat Y           1
3: 1 0 2  2 cat X           2
4: 3 4 3  0 cat Y           1

Is there a way to specify this?

CodePudding user response:

.N behaves irrespective of a column (like dplyr::n), so you can sum the non-NA values of column D:

sample[, observations := sum(!is.na(D)), by = cat][]

   A B C  D   cat observations
1: 1 0 2  2 cat X            2
2: 3 4 3 NA cat Y            1
3: 1 0 2  2 cat X            2
4: 3 4 3  0 cat Y            1
  • Related