I have data as follows, with which I want to calculate observations by group as follows:
library(data.table)
dat <- fread("col1 col2 col3 group
1 2 4 A
3 2 2 A
1 NA 1 B
3 2 1 B")
vars_of_interest <- c("col1", "col2")
vars_of_interest_obs <- paste0(vars_of_interest, "_obs_tot")
dat <- setDT(dat)[, (vars_of_interest_obs) := sum(!is.na(vars_of_interest)), by = c("group")]
However the outcome of this is:
col1 col2 col3 group col1_obs_tot col2_obs_tot
1: 1 2 4 A 2 2
2: 3 2 2 A 2 2
3: 1 NA 1 B 2 2
4: 3 2 1 B 2 2
Where the last column should be
col2_obs_tot
2
2
1
1
What am I doing wrong here?
CodePudding user response:
This is because vars_of_interest
are evaluated litteraly:
sum(is.na('col1','col2')) = 2
You need to get
their content:
setDT(dat)[, (vars_of_interest_obs) := lapply(vars_of_interest, function(x) sum(!is.na(get(x)))), by = c("group")][]
col1 col2 col3 group col1_obs_tot col2_obs_tot
<int> <int> <int> <char> <int> <int>
1: 1 2 4 A 2 2
2: 3 2 2 A 2 2
3: 1 NA 1 B 2 1
4: 3 2 1 B 2 1