Home > Enterprise >  Recode only certain values and keep others as it is in R
Recode only certain values and keep others as it is in R

Time:03-11

I am trying to recode a list of columns var1:var8 in df - "sampledf" where I am changing the values "B" and "D" into "0", but keeping the other values as it is.

sampledf <- data.frame(
    var1 = c(1,4,2,1,1,0,0,1,0,0,0),
  var2 = c(1,1,"D",1,0,0,1,"B",0,"D",0),
  var3 = c(1,5,2,1,"B",0,1,1,1,0,0),
  var4 = c(1,1,0,1,2,0,1,1,5,1,1),
  var5 = c(0,4,"D",1,0,0,0,1,1,1,1),
  var6 = c(1,"D",0,1,0,2,1,1,0,1,0),
  var7 = c(1,1,0,0,1,"E",1,0,"D",1,1),
  var8 = c(1,1,0,0,2,5,1,"D",0,3,1))

This is what I tried but did not work. Compared to this example, the other values I have in my real dataset is very very long. So I cannot manually supply all the values. All I want is just to change this and keep others as it is.

sampledfnew <- sampledf %>% mutate(across(var1:var2, ~recode(
  .x,
  'B'=0L,
  'D'=0L,
  TRUE ~ X,
)))

Can anyone help me fix the error here? Thank you

CodePudding user response:

There are many ways to do this. Using ifelse -

library(dplyr)

change_values <- c('B', 'D')
sampledf %>% mutate(across(var1:var2, ~ifelse(.x %in% change_values, 0, .x)))

#   var1 var2 var3 var4 var5 var6 var7 var8
#1     1    1    1    1    0    1    1    1
#2     4    1    5    1    4    D    1    1
#3     2    0    2    0    D    0    0    0
#4     1    1    1    1    1    1    0    0
#5     1    0    B    2    0    0    1    2
#6     0    0    0    0    0    2    E    5
#7     0    1    1    1    0    1    1    1
#8     1    0    1    1    1    1    0    D
#9     0    0    1    5    1    0    D    0
#10    0    0    0    1    1    1    1    3
#11    0    0    0    1    1    0    1    1

CodePudding user response:

Alternatives to ifelse, since it is prone to at least two not-insignificant issues (class-dropping and class-ambiguity, discussed below).

sampledf %>%
  mutate(
    across(var1:var8, ~ if_else(
      . %in% c("B", "D"),
      if (is.character(.)) "0" else 0, # could also be maybechar(0, .) from below
      .)
    )
  )
#    var1 var2 var3 var4 var5 var6 var7 var8
# 1     1    1    1    1    0    1    1    1
# 2     4    1    5    1    4    0    1    1
# 3     2    0    2    0    0    0    0    0
# 4     1    1    1    1    1    1    0    0
# 5     1    0    0    2    0    0    1    2
# 6     0    0    0    0    0    2    E    5
# 7     0    1    1    1    0    1    1    1
# 8     1    0    1    1    1    1    0    0
# 9     0    0    1    5    1    0    0    0
# 10    0    0    0    1    1    1    1    3
# 11    0    0    0    1    1    0    1    1

In case you don't always want B/D to be replaced with the same value,

maybechar <- function(val, src) if (is.character(src)) as.character(val) else val
sampledf %>%
  mutate(
    across(var1:var8, ~ case_when(
      . == "B" ~ maybechar(0, .),
      . == "D" ~ maybechar(0, .),
      TRUE ~ .)
    )
  )

Notes:

  • Most of the replacement being doing here is actually replacing with a "0" string instead of a 0 integer, because most of your data is string.

  • The use of ifelse by itself is something I often recommend against due to class ambiguity. It is feasible with ifelse to change the class of the return value without realizing it. See the difference between ifelse(c(T,T), 1:2, c("A","B")) and compare with ifelse(c(T,F), 1:2, c("A","B")) to see what I mean. This is "dangerous"/risky, and one thing that if_else explicitly guards against. (This also is enforced by case_when in my second code block.)

  • It is because of the previous bullet that I suggested the use of something like maybechar, which might suggest a little sloppy code but at least is a little more declarative/intentional about it. I give two ways to do it: the first is explicitly without a helper function, shown in the if_else example above, the second is with the helper function. It seems more prudent to use the helper function in the case of case_when, since the operation is being doing multiple times, so the code is a little easier to read (imo).

CodePudding user response:

Another base R solution is:

sampledf[apply(sampledf, 2, \(x) x %in% c("B", "D"))] <- 0

> sampledf
   var1 var2 var3 var4 var5 var6 var7 var8
1     1    1    1    1    0    1    1    1
2     4    1    5    1    4    0    1    1
3     2    0    2    0    0    0    0    0
4     1    1    1    1    1    1    0    0
5     1    0    0    2    0    0    1    2
6     0    0    0    0    0    2    E    5
7     0    1    1    1    0    1    1    1
8     1    0    1    1    1    1    0    0
9     0    0    1    5    1    0    0    0
10    0    0    0    1    1    1    1    3
11    0    0    0    1    1    0    1    1
  • Related