Home > other >  Efficiently condensing several ifelse statements
Efficiently condensing several ifelse statements

Time:02-01

I have the following lines of code (where I recode specific values in my age variable) that I would like to condense using base R, ideally into one or two lines. Any help would be appreciated.

ibr.sub$Age.recode <- ibr.sub$age
ibr.sub$Age.recode <- ifelse(ibr.sub$age == 4, 3, ibr.sub$Age.recode)
ibr.sub$Age.recode <- ifelse(ibr.sub$age == 20, 18, ibr.sub$Age.recode)
ibr.sub$Age.recode <- ifelse(ibr.sub$age == 22, 24, ibr.sub$Age.recode)
ibr.sub$Age.recode <- ifelse(ibr.sub$age == 26, 24, ibr.sub$Age.recode)
ibr.sub$Age.recode <- ifelse(ibr.sub$age == 31, 30, ibr.sub$Age.recode)

CodePudding user response:

If the numbers tested are the only x values possible then:

y <- 3 * (x == 4)   18 * (x == 20)   24 * (x %in% c(22, 26))   30 * (x == 31)

or if there are other values possible for x too then use the above and follow it with this

(y == 0) * x   y

CodePudding user response:

Here a solution replacing with a named vector, your entry data could have more values not recoded if neccesary:

data:

ibr.sub <- tibble(age = c(4,4,5,6,20,22,23,22,26,31,30,31))

Solution:

recode_vec = c("4" = "3", "20" = "18", "22" = "24", "26" = "24", "31" = "30")
ibr.sub$Age.recode <- as.numeric(recode_vec[as.character(ibr.sub$age)])

Output:

# A tibble: 12 × 2
     age Age.recode
   <dbl>      <dbl>
 1     4          3
 2     4          3
 3     5         NA
 4     6         NA
 5    20         18
 6    22         24
 7    23         NA
 8    22         24
 9    26         24
10    31         30
11    30         NA
12    31         30

In case you want to keep data in Age.recode with values from age not in recode vector, you need to check if each value produce NA and then take the value from age:

ibr.sub <- tibble(age = c(4,4,5,6,20,22,23,22,26,31,30,31, 3, 3))
recode_vec = c("4" = "3", "20" = "18", "22" = "24", "26" = "24", "31" = "30")
ibr.sub$Age.recode <- as.numeric(ifelse(is.na(recode_vec[as.character(ibr.sub$age)]), ibr.sub$age, recode_vec[as.character(ibr.sub$age)]))

Output:

# A tibble: 14 × 2
     age Age.recode
   <dbl>      <dbl>
 1     4          3
 2     4          3
 3     5          5
 4     6          6
 5    20         18
 6    22         24
 7    23         23
 8    22         24
 9    26         24
10    31         30
11    30         30
12    31         30
13     3          3
14     3          3

Code explanation: You define recode_vec that it´s just a named vector (each element in vector have a name associated, left side of equal is name and right side, the value):

> recode_vec
   4   20   22   26   31 
 "3" "18" "24" "24" "30" 

In this way, you have like a dictionary. ibr.sub$age is numeric, so you cast to character to work with recode_vec (as.character(ibr.sub$age)).

Now, to access elements in a named vector, you just have to past the name of element between brackets. Ex:

> recode_vec["4"]
  4 
"3" 
> recode_vec["5"] # "5" don´t exist in vector, so return NA
<NA> 
  NA 

If you pass a vector with names, (in this case names are the values in ibr.sub$age like chars) then get that vector translated:

> recode_vec[c("4", "4", "5")]
   4    4 <NA> 
 "3"  "3"   NA 

Finally you have check with ifelse clause values translated like NA (use directly is.na that return logical value) to replace this cases with original data in column age. All this produce a character vector, so optionally you can convert to numeric with as.numeric.

CodePudding user response:

Something like this?

current <- c(4, 20, 22, 26, 31)
new <- c(3, 18, 24, 24, 30)

i <- match(ibr.sub$age, current)
ibr.sub$Age.recode <- ibr.sub$age
ibr.sub$Age.recode[i] <- new[i]

In only three instructions:

i <- match(ibr.sub$age, c(4, 20, 22, 26, 31))
ibr.sub$Age.recode <- ibr.sub$age
ibr.sub$Age.recode[i] <- c(3, 18, 24, 24, 30)[i]

If column age has values not in the vector c(4, 20, 22, 26, 31), the match instruction will have NA's in it and the indexing will give an error. Try instead

i <- match(ibr.sub$age, c(4, 20, 22, 26, 31))
ok <- !is.na(i)
ibr.sub$Age.recode <- ibr.sub$age
ibr.sub$Age.recode[ i[ok] ] <- c(3, 18, 24, 24, 30)[ i[ok] ]

CodePudding user response:

The best way to do that is using tidyverse.

ibr.sub %>%                 # your data.frame
 rename(age= Age) %>%       # rename col from Age to age
 mutate( age = case_when(   # case_when : what to change in each case
    age == 4 ~ 3,           # when age == 4, change to 3
    age == 20 ~ 18,         # same thing
    age == 22 ~ 24,
    age == 26 ~ 24,
    age == 31 ~ 30,
    TRUE ~ age) )           # if the value is already TRUE, keep it.

It will return this data frame (without the col Age, that I have added just for illustration purpose):

  id Age age
1  1   3   3
2  2   4   3
3  3   5   5
4  4  20  18
5  5  21  21
6  6  22  24
7  7  26  24
8  8  31  30
  • Related