Home > database >  Recode values based on values in other dataset
Recode values based on values in other dataset

Time:01-19

I want to recode values in one dataset based on values in another dataset. My overall goal is to apply recode across multiple columns of the dataframe.

Data:

df <- data.frame(
  gender=c(1,2,1,2),
  condition=c(1,1,2,2)
)
df

  gender condition
1      1         1
2      2         1
3      1         2
4      2         2

Other dataset:

codes <- data.frame(
  gender_values= c("`1`='male', `2`='female'"),
  condition_values = c("`1`='exp', `2`='control'")
)

codes

             gender_values         condition_values
1 `1`='male', `2`='female' `1`='exp', `2`='control'

Attempt:

df %>% 
  dplyr::mutate(
  gender= dplyr::recode(gender, cat(noquote(codes[1,"gender_values"])), .default = NA_character_)
)

`1`='male', `2`='female'  gender condition
1   <NA>         1
2   <NA>         1
3   <NA>         2
4   <NA>         2

Wanted:

  gender condition
1   male       exp
2 female       exp
3   male   control
4 female   control

CodePudding user response:

If you want to use dplyr::recode, you can exploit the splice operator !!! to help out here and evaluate the values in codes. It helps if you simplify your codes data:

codes <- data.frame(
  gender_values = c("male", "female"),
  condition_values = c("exp", "control")
)

Then, for example, on a single column you can do:

dplyr::recode(df$gender, !!!codes$gender_values)
# [1] "male"   "female" "male"   "female"

One way to apply it across columns given your example data is to use sapply:

sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes[,paste0(x, "_values")]))

#      gender   condition
# [1,] "male"   "exp"    
# [2,] "female" "exp"    
# [3,] "male"   "control"
# [4,] "female" "control"

(Note this specific example assumed all column names in codes for variable “x” (in df) is “x_values”, as in your example data)

Also, if you wanted to keep you codes values exactly as is, you could do:

codes <- data.frame(
  gender_values= c("`1`='male', `2`='female'"),
  condition_values = c("`1`='exp', `2`='control'")
)

# single column example

dplyr::recode(df$gender, !!!strsplit(codes$gender_values, ",")[[1]])
# [1] "`1`='male'"    " `2`='female'" "`1`='male'"    " `2`='female'"

# multiple columns
sapply(names(df), function(x) dplyr::recode(df[,x], !!!strsplit(codes[,paste0(x, "_values")], ",")[[1]]))

#      gender          condition       
# [1,] "`1`='male'"    "`1`='exp'"     
# [2,] " `2`='female'" "`1`='exp'"     
# [3,] "`1`='male'"    " `2`='control'"
# [4,] " `2`='female'" " `2`='control'"

To clean it up a bit, though, you could preprocess the original codes:

codes2 <- lapply(codes[], function(x)
  gsub("`|'|=|[[:digit:]] ", "", trimws(unlist(strsplit(x, ",")))))

sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes2[[paste0(x, "_values")]]))

#      gender   condition
# [1,] "male"   "exp"    
# [2,] "female" "exp"    
# [3,] "male"   "control"
# [4,] "female" "control"

I would also advise changing "exp" to "exposure" or "expos" or something else, as exp is a function in R to calculate exponentials, and its good practice not to confuse things!

  • Related