I am working with an SPSS file that has been exported as tab delimited. In SPSS, you can set values to represent different types of missing and the dataset has 98 and 99 to indicate missing.
I want to convert them to NA but only in certain columns (V2 and V3 in the example data, leaving V1 and V4 unchanged).
library(dplyr)
testdf <- data.frame(V1 = c(1, 2, 3, 4),
V2 = c(1, 98, 99, 2),
V3 = c(1, 99, 2, 3),
V4 = c(98, 99, 1, 2))
outdf <- testdf %>%
mutate(across(V2:V3), . = ifelse(. %in% c(98,99), NA, .))
I haven't used across
before and cannot work out how to have the mutate
return the ifelse
into the same columns. I suspect I am overthinking this, but can't find any similar examples that have both across
and ifelse
. I need a tidyverse answer, prefer dplyr or tidyr.
CodePudding user response:
You need the syntax to be slightly different to make it work. Check ?across
for more info.
- You need to use a
~
to make a valid function (or use\(.)
, or usefunction(.)
), - You need to include the formula in the
across
function
library(dplyr)
testdf %>%
mutate(across(V2:V3, ~ ifelse(. %in% c(98,99), NA, .)))
# V1 V2 V3 V4
# 1 1 1 1 98
# 2 2 NA NA 99
# 3 3 NA 2 1
# 4 4 2 3 2
Note that an alternative is replace
:
testdf %>%
mutate(across(V2:V3, ~ replace(., . %in% c(98,99), NA)))
CodePudding user response:
Base R
option using lapply
with an ifelse
like this:
cols <- c("V2","V3")
testdf[,cols] <- lapply(testdf[,cols],function(x) ifelse(x %in% c(98,99),NA,x))
testdf
#> V1 V2 V3 V4
#> 1 1 1 1 98
#> 2 2 NA NA 99
#> 3 3 NA 2 1
#> 4 4 2 3 2
Created on 2022-10-19 with reprex v2.0.2
CodePudding user response:
Base R:
cols <- c("V2", "V3")
testdf[, cols ][ testdf[, cols ] > 97 ] <- NA