I want to recode values in one dataset based on values in another dataset. My overall goal is to apply recode
across multiple columns of the dataframe.
Data:
df <- data.frame(
gender=c(1,2,1,2),
condition=c(1,1,2,2)
)
df
gender condition
1 1 1
2 2 1
3 1 2
4 2 2
Other dataset:
codes <- data.frame(
gender_values= c("`1`='male', `2`='female'"),
condition_values = c("`1`='exp', `2`='control'")
)
codes
gender_values condition_values
1 `1`='male', `2`='female' `1`='exp', `2`='control'
Attempt:
df %>%
dplyr::mutate(
gender= dplyr::recode(gender, cat(noquote(codes[1,"gender_values"])), .default = NA_character_)
)
`1`='male', `2`='female' gender condition
1 <NA> 1
2 <NA> 1
3 <NA> 2
4 <NA> 2
Wanted:
gender condition
1 male exp
2 female exp
3 male control
4 female control
CodePudding user response:
If you want to use dplyr::recode
, you can exploit the splice operator !!!
to help out here and evaluate the values in codes
. It helps if you simplify your codes
data:
codes <- data.frame(
gender_values = c("male", "female"),
condition_values = c("exp", "control")
)
Then, for example, on a single column you can do:
dplyr::recode(df$gender, !!!codes$gender_values)
# [1] "male" "female" "male" "female"
One way to apply it across columns given your example data is to use sapply
:
sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes[,paste0(x, "_values")]))
# gender condition
# [1,] "male" "exp"
# [2,] "female" "exp"
# [3,] "male" "control"
# [4,] "female" "control"
(Note this specific example assumed all column names in codes
for variable “x” (in df
) is “x_values”, as in your example data)
Also, if you wanted to keep you codes
values exactly as is, you could do:
codes <- data.frame(
gender_values= c("`1`='male', `2`='female'"),
condition_values = c("`1`='exp', `2`='control'")
)
# single column example
dplyr::recode(df$gender, !!!strsplit(codes$gender_values, ",")[[1]])
# [1] "`1`='male'" " `2`='female'" "`1`='male'" " `2`='female'"
# multiple columns
sapply(names(df), function(x) dplyr::recode(df[,x], !!!strsplit(codes[,paste0(x, "_values")], ",")[[1]]))
# gender condition
# [1,] "`1`='male'" "`1`='exp'"
# [2,] " `2`='female'" "`1`='exp'"
# [3,] "`1`='male'" " `2`='control'"
# [4,] " `2`='female'" " `2`='control'"
To clean it up a bit, though, you could preprocess the original codes:
codes2 <- lapply(codes[], function(x)
gsub("`|'|=|[[:digit:]] ", "", trimws(unlist(strsplit(x, ",")))))
sapply(names(df), function(x) dplyr::recode(df[,x], !!!codes2[[paste0(x, "_values")]]))
# gender condition
# [1,] "male" "exp"
# [2,] "female" "exp"
# [3,] "male" "control"
# [4,] "female" "control"
I would also advise changing "exp" to "exposure" or "expos" or something else, as exp
is a function in R to calculate exponentials, and its good practice not to confuse things!