Home > database >  Recode missing values in multiple columns: mutate with across and ifelse
Recode missing values in multiple columns: mutate with across and ifelse

Time:10-19

I am working with an SPSS file that has been exported as tab delimited. In SPSS, you can set values to represent different types of missing and the dataset has 98 and 99 to indicate missing.

I want to convert them to NA but only in certain columns (V2 and V3 in the example data, leaving V1 and V4 unchanged).

library(dplyr)
testdf <- data.frame(V1 = c(1, 2, 3, 4),
                     V2 = c(1, 98, 99, 2),
                     V3 = c(1, 99, 2, 3),
                     V4 = c(98, 99, 1, 2))
outdf <- testdf %>% 
  mutate(across(V2:V3), . = ifelse(. %in% c(98,99), NA, .))

I haven't used across before and cannot work out how to have the mutate return the ifelse into the same columns. I suspect I am overthinking this, but can't find any similar examples that have both across and ifelse. I need a tidyverse answer, prefer dplyr or tidyr.

CodePudding user response:

You need the syntax to be slightly different to make it work. Check ?across for more info.

  1. You need to use a ~ to make a valid function (or use \(.), or use function(.)),
  2. You need to include the formula in the across function
library(dplyr)
testdf %>% 
  mutate(across(V2:V3, ~ ifelse(. %in% c(98,99), NA, .)))

#   V1 V2 V3 V4
# 1  1  1  1 98
# 2  2 NA NA 99
# 3  3 NA  2  1
# 4  4  2  3  2 

Note that an alternative is replace:

testdf %>% 
  mutate(across(V2:V3, ~ replace(., . %in% c(98,99), NA)))

CodePudding user response:

Base R option using lapply with an ifelse like this:

cols <- c("V2","V3")
testdf[,cols] <- lapply(testdf[,cols],function(x) ifelse(x %in% c(98,99),NA,x))
testdf
#>   V1 V2 V3 V4
#> 1  1  1  1 98
#> 2  2 NA NA 99
#> 3  3 NA  2  1
#> 4  4  2  3  2

Created on 2022-10-19 with reprex v2.0.2

CodePudding user response:

Base R:

cols <- c("V2", "V3")
testdf[, cols ][ testdf[, cols ] > 97 ] <- NA
  • Related