I need to recode a factor variable with almost 90 levels. It is trait names from database which I then need to pivot to get the dataset for analysis. Is there a way to do it automatically without typing each OldName=NewName?
This is how I do it with dplyr for fewer levels:
df$TraitName <- recode_factor(df$TraitName, 'Old Name' = "new.name")
My idea was to use a key dataframe with a column of old names and corresponding new names but I cannot figure out how to feed it to recode
CodePudding user response:
One way would be a lookup table, a join, and coalesce
(to get the first non-NA value:
my_data <- data.frame(letters = letters[1:6])
levels_to_change <- data.frame(letters = letters[4:5],
new_letters = LETTERS[4:5])
library(dplyr)
my_data %>%
left_join(levels_to_change) %>%
mutate(new = coalesce(new_letters, letters))
Result
Joining, by = "letters"
letters new_letters new
1 a <NA> a
2 b <NA> b
3 c <NA> c
4 d D D
5 e E E
6 f <NA> f
CodePudding user response:
You could quite easily create a named vector from your lookup table and pass that to recode using splicing. It might as well be faster than a join.
library(tidyverse)
# test data
df <- tibble(TraitName = c("a", "b", "c"))
# Make a lookup table with your own data
# Youll bind your two columns instead here
# youll want to keep column order to deframe it.
# column names doesnt matter.
lookup <- tibble(old = c("a", "b", "c"), new = c("aa", "bb", "cc"))
# Convert to named vector and splice it within the recode
df <-
df |>
mutate(TraitNameRecode = recode_factor(TraitName, !!!deframe(lookup)))