I have to calculate the number of missing values per observation in a data set. As there are several variables across multiple time periods, I thought it best to try a function to keep my syntax clean. The first part of looking up the number of missing values works fine:
data$NMISS <- data %>%
select('x1':'x4') %>%
apply(1, function(x) sum(is.na(x)))
But when I try turn it into a function I get "Error in select():! NA/NaN argument"
library(dplyr)
library(tidyverse)
data <- data.frame(x1 = c(NA, 1, 5, 1),
x2 = c(7, 1, 1, 5),
x3 = c(9, NA, 4, 9),
x4 = c(3, 4, 1, 2))
NMISSfunc <- function (dataFrame,variables) {
dataFrame %>% select(variables) %>%
apply(1, function(x) sum(is.na(x)))
}
data$NMISS2 <- NMISSfunc(data,'x1':'x4')
I think it doesn't like the :
in the range as it will accept c('x1','x2','x3','x4')
instead of 'x1':'x4'
Some of the ranges are over twenty columns so listing them doesn't really provide a solution to keep the syntax neat.
Any suggestions?
CodePudding user response:
You are right that you can't use "x4":"x4"
, as this isn't valid R syntax. To get this to work in a tidyverse-style, your variables
variable needs to be selectively unquoted inside select
. Fortunately, the tidyverse has the curly-curly notation {{variables}}
for handling exactly this situation:
NMISSfunc <- function (dataFrame, variables) {
dataFrame %>%
select({{variables}}) %>%
apply(1, function(x) sum(is.na(x)))
}
Now we can use x1:x4
(without quotes) and the function works as expected:
NMISSfunc(data, x1:x4)
#> [1] 1 1 0 0
Created on 2022-12-13 with reprex v2.0.2
CodePudding user response:
Why not simply,
data %>%
mutate(NMISS = rowSums(is.na(select(., x1:x4))))
x1 x2 x3 x4 NMISS
1 NA 7 9 3 1
2 1 1 NA 4 1
3 5 1 4 1 0
4 1 5 9 2 0