I have a dataframe with some columns where 99 should be considered as missing values (NA) and other columns where 999 was the value given for this purpose.
dat$variable1 <- ifelse(dat$variable1 == 99, NA, dat$EC)
dat$variable2 <- ifelse(dat$variable2 == 99, NA, dat$EC)
dat$variable3 <- ifelse(dat$variable3 == 99, NA, dat$EC)
dat$variable4 <- ifelse(dat$variable4 == 99, NA, dat$EC)
dat$variable5 <- ifelse(dat$variable5 == 999, NA, dat$EC)
dat$variable6 <- ifelse(dat$variable6 == 999, NA, dat$EC)
dat$variable7 <- ifelse(dat$variable7 == 999, NA, dat$EC)
I'd like to find a better way to do that, because sometimes we can have many many columns to deal with. I don't know how to loop over the specific variables that I should replace these values for NA and I'm not aware of a package that could help me with that (I'm a beginner in R).
CodePudding user response:
If 99 and 999 are unique values for missing in your data frame dat
, you could just:
dat[dat == 999] <- NA
dat[dat == 99] <- NA
If not, you could use na_if
from dplyr
dat %>%
mutate(across(c(variable1:variable4), na_if, 99),
across(c(variable5:variable7), na_if, 999))
CodePudding user response:
You may try using dplyr::across
.
For a dummy data dat
defined like
dat <- data.frame(
variable1 = c(1,2,3,4,5,6,99),
variable2 = c(1,2,99,4,5,6,7),
variable3 = c(1:7),
variable4 = c(5:11),
variable5 = c(1,2,3,4,5,6,999),
variable6 = c(1,2,3,4,999,6,7),
variable7 = c(1:7),
EC = c(-1,-2,-3,-4,-5,-6,-7)
)
variable1 variable2 variable3 variable4 variable5 variable6 variable7 EC
1 1 1 1 5 1 1 1 -1
2 2 2 2 6 2 2 2 -2
3 3 99 3 7 3 3 3 -3
4 4 4 4 8 4 4 4 -4
5 5 5 5 9 5 999 5 -5
6 6 6 6 10 6 6 6 -6
7 99 7 7 11 999 7 7 -7
You may try this way.
library(dplyr)
dat %>%
rowwise %>%
mutate(across(variable1:variable4, ~ifelse(.x == 99, NA, EC)),
across(variable5:variable7, ~ifelse(.x == 999, NA, EC)))
variable1 variable2 variable3 variable4 variable5 variable6 variable7 EC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1 -1 -1 -1 -1 -1 -1 -1
2 -2 -2 -2 -2 -2 -2 -2 -2
3 -3 NA -3 -3 -3 -3 -3 -3
4 -4 -4 -4 -4 -4 -4 -4 -4
5 -5 -5 -5 -5 -5 NA -5 -5
6 -6 -6 -6 -6 -6 -6 -6 -6
7 NA -7 -7 -7 NA -7 -7 -7
If you know column indexes, like in my dat
, from variable1
to variable4
is 1:4
and variable5
to variable7
is 5:7
, just using column indexes will give you the same result.
dat %>%
rowwise %>%
mutate(across(1:4, ~ifelse(.x == 99, NA, EC)),
across(5:7, ~ifelse(.x == 999, NA, EC)))
variable1 variable2 variable3 variable4 variable5 variable6 variable7 EC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 -1 -1 -1 -1 -1 -1 -1 -1
2 -2 -2 -2 -2 -2 -2 -2 -2
3 -3 NA -3 -3 -3 -3 -3 -3
4 -4 -4 -4 -4 -4 -4 -4 -4
5 -5 -5 -5 -5 -5 NA -5 -5
6 -6 -6 -6 -6 -6 -6 -6 -6
7 NA -7 -7 -7 NA -7 -7 -7
CodePudding user response:
Consider running ifelse
on a block of columns since it works on vectors and matrices:
var_99 <- c("variable1", "variable2", "variable3", "variable4")
var_999 <- c("variable5", "variable6", "variable7")
dat[var_99] <- ifelse(dat[var_99] == 99, NA, dat$EC)
dat[var_999] <- ifelse(dat[var_999] == 999, NA, dat$EC)