Home > OS >  Applying ifelse in multiple variables / columns for replacing 99 and 999 to NA
Applying ifelse in multiple variables / columns for replacing 99 and 999 to NA

Time:11-22

I have a dataframe with some columns where 99 should be considered as missing values (NA) and other columns where 999 was the value given for this purpose.

dat$variable1 <- ifelse(dat$variable1 == 99, NA, dat$EC)
dat$variable2 <- ifelse(dat$variable2 == 99, NA, dat$EC)
dat$variable3 <- ifelse(dat$variable3 == 99, NA, dat$EC)
dat$variable4 <- ifelse(dat$variable4 == 99, NA, dat$EC)
dat$variable5 <- ifelse(dat$variable5 == 999, NA, dat$EC)
dat$variable6 <- ifelse(dat$variable6 == 999, NA, dat$EC)
dat$variable7 <- ifelse(dat$variable7 == 999, NA, dat$EC)

I'd like to find a better way to do that, because sometimes we can have many many columns to deal with. I don't know how to loop over the specific variables that I should replace these values for NA and I'm not aware of a package that could help me with that (I'm a beginner in R).

CodePudding user response:

If 99 and 999 are unique values for missing in your data frame dat, you could just:

dat[dat == 999] <- NA
dat[dat == 99] <- NA

If not, you could use na_if from dplyr

dat %>%
  mutate(across(c(variable1:variable4), na_if, 99), 
         across(c(variable5:variable7), na_if, 999))

CodePudding user response:

You may try using dplyr::across.

For a dummy data dat defined like

dat <- data.frame(
  variable1 = c(1,2,3,4,5,6,99),
  variable2 = c(1,2,99,4,5,6,7),
  variable3 = c(1:7),
  variable4 = c(5:11),
  variable5 = c(1,2,3,4,5,6,999),
  variable6 = c(1,2,3,4,999,6,7),
  variable7 = c(1:7),
  EC = c(-1,-2,-3,-4,-5,-6,-7)
)

  variable1 variable2 variable3 variable4 variable5 variable6 variable7 EC
1         1         1         1         5         1         1         1 -1
2         2         2         2         6         2         2         2 -2
3         3        99         3         7         3         3         3 -3
4         4         4         4         8         4         4         4 -4
5         5         5         5         9         5       999         5 -5
6         6         6         6        10         6         6         6 -6
7        99         7         7        11       999         7         7 -7

You may try this way.

library(dplyr)
dat %>%
  rowwise %>%
  mutate(across(variable1:variable4, ~ifelse(.x == 99, NA, EC)),
         across(variable5:variable7, ~ifelse(.x == 999, NA, EC)))

  variable1 variable2 variable3 variable4 variable5 variable6 variable7    EC
      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
1        -1        -1        -1        -1        -1        -1        -1    -1
2        -2        -2        -2        -2        -2        -2        -2    -2
3        -3        NA        -3        -3        -3        -3        -3    -3
4        -4        -4        -4        -4        -4        -4        -4    -4
5        -5        -5        -5        -5        -5        NA        -5    -5
6        -6        -6        -6        -6        -6        -6        -6    -6
7        NA        -7        -7        -7        NA        -7        -7    -7

If you know column indexes, like in my dat, from variable1 to variable4 is 1:4 and variable5 to variable7 is 5:7, just using column indexes will give you the same result.

dat %>%
  rowwise %>%
  mutate(across(1:4, ~ifelse(.x == 99, NA, EC)),
         across(5:7, ~ifelse(.x == 999, NA, EC)))

  variable1 variable2 variable3 variable4 variable5 variable6 variable7    EC
      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl> <dbl>
1        -1        -1        -1        -1        -1        -1        -1    -1
2        -2        -2        -2        -2        -2        -2        -2    -2
3        -3        NA        -3        -3        -3        -3        -3    -3
4        -4        -4        -4        -4        -4        -4        -4    -4
5        -5        -5        -5        -5        -5        NA        -5    -5
6        -6        -6        -6        -6        -6        -6        -6    -6
7        NA        -7        -7        -7        NA        -7        -7    -7

CodePudding user response:

Consider running ifelse on a block of columns since it works on vectors and matrices:

var_99 <- c("variable1", "variable2", "variable3", "variable4")
var_999 <- c("variable5", "variable6", "variable7")

dat[var_99] <- ifelse(dat[var_99] == 99, NA, dat$EC)
dat[var_999] <- ifelse(dat[var_999] == 999, NA, dat$EC)
  • Related