I am processing a large dataset from a questionnaire that contains coded responses in some but not all columns. I would like to replace the coded responses with actual values. The key/dictionary is stored in another database. The complicating factor is that different questions (stored as columns in original dataset) used the same code (typically numeric), but the code has different meanings depending on the column (question).
How can I replace the coded values in the original dataset with different valuse from a corresponding key stored in the dictionary table, but do it by specific column name (also stored in the dictionary table)?
Below is an example of the original dataset and the dictionary table, as well as desired result.
original <- data.frame(
name = c('Jane','Mary','John', 'Billy'),
home = c(1,3,4,2),
car = c('b','b','a','b'),
shirt = c(3,2,1,1),
shoes = c('Black','Black','Black','Brown')
)
keymap <- data.frame(
column_name=c('home','home','home','home','car','car','shirt','shirt','shirt'),
value_old=c('1','2','3','4','a','b','1','2','3'),
value_new=c('Single family','Duplex','Condo','Apartment','Sedan','SUV','White','Red','Blue')
)
result <- data.frame(
name = c('Jane','Mary','John', 'Billy'),
home = c('Single family','Condo','Apartment','Duplex'),
car = c('SUV','SUV','Sedan','SUV'),
shirt = c('Blue','Red','White','White'),
shoes = c('Black','Black','Black','Brown')
)
> original
name home car shirt shoes
1 Jane 1 b 3 Black
2 Mary 3 b 2 Black
3 John 4 a 1 Black
4 Billy 2 b 1 Brown
> keymap
column_name value_old value_new
1 home 1 Single family
2 home 2 Duplex
3 home 3 Condo
4 home 4 Apartment
5 car a Sedan
6 car b SUV
7 shirt 1 White
8 shirt 2 Red
9 shirt 3 Blue
> result
name home car shirt shoes
1 Jane Single family SUV Blue Black
2 Mary Condo SUV Red Black
3 John Apartment Sedan White Black
4 Billy Duplex SUV White Brown
I have tried different approaches using dplyr but have not gotten far as I do not have a robust understanding of the mutate/join syntax.
CodePudding user response:
We may loop across
the unique
values from the 'column_name' column of 'keymap' in the original
, subset the keymap
that matches the column name (cur_column()
), select the columns 2 and 3, deframe
to a named vector and match with the values of the column for replacement
library(dplyr)
library(tibble)
original %>%
mutate(across(all_of(unique(keymap$column_name)), ~
(keymap %>%
filter(column_name == cur_column()) %>%
select(-column_name) %>%
deframe)[as.character(.x)]))
-output
name home car shirt shoes
1 Jane Single family SUV Blue Black
2 Mary Condo SUV Red Black
3 John Apartment Sedan White Black
4 Billy Duplex SUV White Brown
Or an approach in base R
lst1 <- split(with(keymap, setNames(value_new, value_old)), keymap$column_name)
original[names(lst1)] <- Map(\(x, y) y[as.character(x)],
original[names(lst1)], lst1)
CodePudding user response:
Please check below code where we can use the factor
to replace the values in one column with data from another dataframe here in this case with keymap
library(tidyverse)
original %>% mutate(home=factor(home, keymap$value_old, keymap$value_new),
car=factor(car, keymap$value_old, keymap$value_new),
shirt=factor(shirt, keymap$value_old, keymap$value_new)
)
Created on 2023-02-04 with reprex v2.0.2
name home car shirt shoes
1 Jane Single family SUV Condo Black
2 Mary Condo SUV Duplex Black
3 John Apartment Sedan Single family Black
4 Billy Duplex SUV Single family Brown