Home > OS >  Traveling between nesting levels in a tibble: how to refer to data stored in upper levels of nesting
Traveling between nesting levels in a tibble: how to refer to data stored in upper levels of nesting

Time:12-19

I have a tibble that contains a list-column of data frames. In this minimal example, such tibble has 1 row only:

library(tibble)

df_meta <- 
  tibble(my_base_number = 5,
         my_data = list(mtcars))

df_meta
#> # A tibble: 1 x 2
#>   my_base_number my_data       
#>            <dbl> <list>        
#> 1              5 <df [32 x 11]>

I want to modify the table inside my_data and mutate a new column in there. It's mtcars data, and I want to mutate a new column that takes a log of the mpg column.

Although I can do this:

library(dplyr)
library(purrr)

df_meta %>%
  mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>% 
                                                         mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base = 5)))
                                    )
         )
#> # A tibble: 1 x 3
#>   my_base_number my_data        my_data_with_log_col
#>            <dbl> <list>         <list>              
#> 1              5 <df [32 x 11]> <df [32 x 12]>     

What I really want is that the call to log() inside inner map() will pass the value to the base argument from df_meta$my_base_number rather than the hard-coded 5 in my example.

And although in this 1-row example this simply works:

df_meta %>%
  mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>% 
                                                         mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base = df_meta$my_base_number)))
                                    )
         )

consider just a bit more complicated pipe procedure where it doesn't work anymore:

tibble(my_data = rep(list(mtcars), 3)) %>%
  add_column(base_number = 1:3) %>%
  mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>% 
                                      mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base =  # <- ???
                                                                                 )))
                                    )
  )

So what I'm looking for is a procedure that allows me to "travel" up and down in the nesting hierarchy when I refer to different values that are stored in whatever construct in each row of the "meta-table".

Right now, as I go deeper with map(), to work on nested tables, I can't refer to data stored upper. If you wish, I'm looking for something analoguous to cd ../../.. when navigating with terminal.

CodePudding user response:

This is not exactly the answer you are asking for. I want to share it as an option!

You could travel around using the combination of unnest and nest:

library(dplyr)
library(tidyr)

df_meta %>% 
  unnest(cols = c(my_data)) %>% 
  mutate(log_mpg = log(mpg, my_base_number)) %>% 
  nest(my_data=mpg:log_mpg)

Output after mutate:

  my_base_number   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb log_mpg
            <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
 1              5  21       6  160    110  3.9   2.62  16.5     0     1     4     4    1.89
 2              5  21       6  160    110  3.9   2.88  17.0     0     1     4     4    1.89
 3              5  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1    1.94
 4              5  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    1.90
 5              5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    1.82
 6              5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    1.80
 7              5  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    1.65
 8              5  24.4     4  147.    62  3.69  3.19  20       1     0     4     2    1.98
 9              5  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2    1.94
10              5  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    1.84

Output final after nest:

  my_base_number my_data           
           <dbl> <list>            
1              5 <tibble [32 × 12]>

CodePudding user response:

Here is the method you asked for. But I actually suggest looking for ways to not be so nested, such as @TarJae's answer.

library(tidyverse)

df_meta <- 
    tibble(my_data = rep(list(mtcars), 3),
           my_base_number = 3:5)

add_log <- function(this_data, this_base){
    this_data %>% mutate(log_mpg = log(mpg, this_base))
}

# check that it works properly:
mtcars %>% add_log(5)

# now apply to each row in df_meta
df_meta %>% 
    mutate(my_data_with_log_col = map2(my_data, my_base_number, add_log))

You'll notice that I didn't need to use map in the inner function. But if I did, I would use map_dbl instead of the map that you used, because you actually want a numeric, not a list of vectors of length one. This also shows that maybe you didn't need the double layered map to begin with.

Also, although an anonymous function is possible, I think it is pretty unreadable for something as complicated as this. That's why I defined the function outside of the map2.

  • Related