I have a tibble that contains a list-column of data frames. In this minimal example, such tibble has 1 row only:
library(tibble)
df_meta <-
tibble(my_base_number = 5,
my_data = list(mtcars))
df_meta
#> # A tibble: 1 x 2
#> my_base_number my_data
#> <dbl> <list>
#> 1 5 <df [32 x 11]>
I want to modify the table inside my_data
and mutate a new column in there. It's mtcars data, and I want to mutate a new column that takes a log of the mpg
column.
Although I can do this:
library(dplyr)
library(purrr)
df_meta %>%
mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>%
mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base = 5)))
)
)
#> # A tibble: 1 x 3
#> my_base_number my_data my_data_with_log_col
#> <dbl> <list> <list>
#> 1 5 <df [32 x 11]> <df [32 x 12]>
What I really want is that the call to log()
inside inner map()
will pass the value to the base
argument from df_meta$my_base_number
rather than the hard-coded 5
in my example.
And although in this 1-row example this simply works:
df_meta %>%
mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>%
mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base = df_meta$my_base_number)))
)
)
consider just a bit more complicated pipe procedure where it doesn't work anymore:
tibble(my_data = rep(list(mtcars), 3)) %>%
add_column(base_number = 1:3) %>%
mutate(my_data_with_log_col = map(.x = my_data, .f = ~ .x %>%
mutate(log_mpg = map(.x = mpg, .f = ~log(.x, base = # <- ???
)))
)
)
So what I'm looking for is a procedure that allows me to "travel" up and down in the nesting hierarchy when I refer to different values that are stored in whatever construct in each row of the "meta-table".
Right now, as I go deeper with map()
, to work on nested tables, I can't refer to data stored upper. If you wish, I'm looking for something analoguous to cd ../../..
when navigating with terminal.
CodePudding user response:
This is not exactly the answer you are asking for. I want to share it as an option!
You could travel around using the combination of unnest
and nest
:
library(dplyr)
library(tidyr)
df_meta %>%
unnest(cols = c(my_data)) %>%
mutate(log_mpg = log(mpg, my_base_number)) %>%
nest(my_data=mpg:log_mpg)
Output after mutate:
my_base_number mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 21 6 160 110 3.9 2.62 16.5 0 1 4 4 1.89
2 5 21 6 160 110 3.9 2.88 17.0 0 1 4 4 1.89
3 5 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 1.94
4 5 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1.90
5 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 1.82
6 5 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 1.80
7 5 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 1.65
8 5 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 1.98
9 5 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 1.94
10 5 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 1.84
Output final after nest
:
my_base_number my_data
<dbl> <list>
1 5 <tibble [32 × 12]>
CodePudding user response:
Here is the method you asked for. But I actually suggest looking for ways to not be so nested, such as @TarJae's answer.
library(tidyverse)
df_meta <-
tibble(my_data = rep(list(mtcars), 3),
my_base_number = 3:5)
add_log <- function(this_data, this_base){
this_data %>% mutate(log_mpg = log(mpg, this_base))
}
# check that it works properly:
mtcars %>% add_log(5)
# now apply to each row in df_meta
df_meta %>%
mutate(my_data_with_log_col = map2(my_data, my_base_number, add_log))
You'll notice that I didn't need to use map
in the inner function. But if I did, I would use map_dbl
instead of the map
that you used, because you actually want a numeric, not a list of vectors of length one. This also shows that maybe you didn't need the double layered map to begin with.
Also, although an anonymous function is possible, I think it is pretty unreadable for something as complicated as this. That's why I defined the function outside of the map2
.