Home > Enterprise >  pmap_df inside mutate applying to entire dataframe not just row
pmap_df inside mutate applying to entire dataframe not just row

Time:07-07

I have a dataframe where each row includes arguments that I want to pass into a function iteratively. The function itself returns a dataframe with a few rows. I would like to keep the arguments and results together in one dataframe by applying pmap_df like you can with pmap_dbl inside of a mutate to add a new column with the results from the function. With the code below, I am able to get a column with nested data in it, but every row contains the data for all of the results, not just the ones corresponding to that row.

library(tidyr)

example_function <- function(data, string, ...){
  
  word_one <- paste(data$word_one, string)
  word_two <- paste(data$word_two, string)
  
  output <- data_frame(result_words = c(word_one, word_two))
  
}

fake_data <- tibble(group_id = rep(c(1, 2), each = 3),
                    word_one = c("hello", "goodbye", "today",
                                 "apple", "banana", "coconut"),
                    word_two = c("my", "name", "is",
                                 "ellie", "good", "morning"))

test <- fake_data %>% 
        group_by(group_id) %>% 
        nest() %>% 
        mutate(string = "not working") %>% 
        mutate(final_output = list(purrr::pmap_df(.l = ., .f = example_function)))

The output looks like:

Rows: 2
Columns: 4
Groups: group_id [2]
$ group_id     <dbl> 1, 2
$ data         <list> [<tbl_df[3 x 2]>], [<tbl_df[3 …
$ string       <chr> "not working", "not working"
$ final_output <list> [<tbl_df[12 x 1]>], [<tbl_df[…

What I would like to have would be for each of the final outputs to have only 6 rows in each dataframe, corresponding to the inputs from the nested data column. Is this possible?

CodePudding user response:

With the OP's function, it may be easily done without any pmap (return the output from the function)

example_function <- function(data, string, ...){
  
  word_one <- paste(data$word_one, string)
  word_two <- paste(data$word_two, string)
  
  output <- data_frame(result_words = c(word_one, word_two))
  output
  
}

As it is a nest_by, directly apply the function

library(dplyr)
fake_data %>% 
         nest_by(group_id) %>%
    mutate(string = "not working") %>%
    mutate(final_output = list(example_function(data, string)))
# A tibble: 2 × 4
# Rowwise:  group_id
  group_id               data string      final_output    
     <dbl> <list<tibble[,2]>> <chr>       <list>          
1        1            [3 × 2] not working <tibble [6 × 1]>
2        2            [3 × 2] not working <tibble [6 × 1]>

With pmap, extract the contents as a list to an object 'x1' then apply the OP's function on the list elements i.e. x1$data and x1$string

library(purrr)
library(stringr)
fake_data %>% 
         nest_by(group_id) %>%
    mutate(string = "not working") %>%
    ungroup %>%
    mutate(final_output = pmap(across(-group_id), 
       ~ {
      x1 <- list(...)
       example_function(x1$data, x1$string)
 }))
# A tibble: 2 × 4
  group_id               data string      final_output    
     <dbl> <list<tibble[,2]>> <chr>       <list>          
1        1            [3 × 2] not working <tibble [6 × 1]>
2        2            [3 × 2] not working <tibble [6 × 1]>
  • Related