Home > Software engineering >  Calculating observations by group for multiple variables
Calculating observations by group for multiple variables

Time:10-16

I have data as follows, with which I want to calculate observations by group as follows:

library(data.table)
dat <- fread("col1 col2 col3 group
       1    2    4    A
       3    2    2    A
       1    NA   1    B
       3    2    1    B")

vars_of_interest <- c("col1", "col2")
vars_of_interest_obs <- paste0(vars_of_interest, "_obs_tot")

dat <- setDT(dat)[, (vars_of_interest_obs) := sum(!is.na(vars_of_interest)), by = c("group")]

However the outcome of this is:

   col1 col2 col3 group col1_obs_tot col2_obs_tot
1:    1    2    4     A            2            2
2:    3    2    2     A            2            2
3:    1   NA    1     B            2            2
4:    3    2    1     B            2            2

Where the last column should be

col2_obs_tot
2
2
1
1

What am I doing wrong here?

CodePudding user response:

This is because vars_of_interest are evaluated litteraly:
sum(is.na('col1','col2')) = 2

You need to get their content:

setDT(dat)[, (vars_of_interest_obs) := lapply(vars_of_interest, function(x)  sum(!is.na(get(x)))), by = c("group")][]

    col1  col2  col3  group col1_obs_tot col2_obs_tot
   <int> <int> <int> <char>        <int>        <int>
1:     1     2     4      A            2            2
2:     3     2     2      A            2            2
3:     1    NA     1      B            2            1
4:     3     2     1      B            2            1
  • Related