Home > database >  How do I update the values of a dataset using matched values in a legend?
How do I update the values of a dataset using matched values in a legend?

Time:06-03

I have a "legend" with a key and several columns:

library(dplyr)

(legend <- tibble(key = 1:3, 
               value_1 = c("x", "y", "z"),
               value_2 = c("fg", "d", "f"),
               value_3 = c("dsf", "sdf", "sdf")))
#> # A tibble: 3 × 4
#>     key value_1 value_2 value_3
#>   <int> <chr>   <chr>   <chr>  
#> 1     1 x       fg      dsf    
#> 2     2 y       d       sdf    
#> 3     3 z       f       sdf

In my real data, I have the same columns but they take the values of the legend key.

(data = tibble(value_1 = c(1, 3, 3),
              value_2 = c(2, 2, 2),
              value_3 = c(1, 2, 3)))
#> # A tibble: 3 × 3
#>   value_1 value_2 value_3
#>     <dbl>   <dbl>   <dbl>
#> 1       1       2       1
#> 2       3       2       2
#> 3       3       2       3

It is possible to match my datasets together in a way to "update" my dataset using the legend?

# expected output
  tibble(value_1 = c("x", "z", "z"),
         value_2 = c("d", "d", "d"),
         value_3 = c("dsf", "sdf", "sdf"))
#> # A tibble: 3 × 3
#>   value_1 value_2 value_3
#>   <chr>   <chr>   <chr>  
#> 1 x       d       dsf    
#> 2 z       d       sdf    
#> 3 z       d       sdf

Created on 2022-06-02 by the reprex package (v2.0.1)

CodePudding user response:

As these are position indexes in 'data' columns, loop across the 'value_' columns of 'legend', get the corresponding column based on the column name (cur_column()) from 'data', use that as numeric index to subset the values of the 'legend' column

library(dplyr)
legend <- legend %>% 
   mutate(across(starts_with('value_'), ~ .x[data[[cur_column()]]]))

-output

legend
# A tibble: 3 × 4
    key value_1 value_2 value_3
  <int> <chr>   <chr>   <chr>  
1     1 x       d       dsf    
2     2 z       d       sdf    
3     3 z       d       sdf    

CodePudding user response:

You can use purrr as well. This should be quicker than mutate(), but also assumes that the values are keyed equal to their row number.

library(purrr)

map_dfc(set_names(names(data)),
        ~ legend[[.x]][data[[.x]]])
# A tibble: 3 x 3
  value_1 value_2 value_3
  <chr>   <chr>   <chr>  
1 x       d       dsf    
2 z       d       sdf    
3 z       d       sdf 
  • Related