Home > OS >  What is the tidyverse way to apply a function designed to take data.frames as input across a grouped
What is the tidyverse way to apply a function designed to take data.frames as input across a grouped

Time:12-10

I've written a function that takes multiple columns as its input that I'd like to apply to a grouped tibble, and I think that something with purrr::map might be the right approach, but I don't understand what the appropriate input is for the various map functions. Here's a dummy example:

 myFun <- function(DF){
  DF %>% mutate(MyOut = (A * B)) %>% pull(MyOut) %>% sum()
}

MyDF <- data.frame(A = 1:5, B = 6:10)
myFun(MyDF)

This works fine. But what if I want to add some grouping?

MyDF <- data.frame(A = 1:100, B = 1:100, Fruit = rep(c("Apple", "Mango"), each = 50))
MyDF %>% group_by(Fruit) %>% summarize(MyVal = myFun(.))

This doesn't work. I get the same value for every group in my data.frame or tibble. I then tried using something with purrr:

MyDF %>% group_by(Fruit) %>% map(.f = myFun)

Apparently, that's expecting character data as input, so that's not it.

This next variation is basically what I need, but the output is a list of lists rather than a tibble with one row for each value of Fruit:

MyDF %>% group_by(Fruit) %>% group_map(~ myFun(.))

CodePudding user response:

We can use the OP's function in group_modify

library(dplyr)
MyDF %>% 
   group_by(Fruit) %>% 
   group_modify(~ .x %>% 
       summarise(MyVal = myFun(.x))) %>%
   ungroup

-output

# A tibble: 2 × 2
  Fruit  MyVal
  <chr>  <int>
1 Apple  42925
2 Mango 295425

Or in group_map where the .y is the grouping column

MyDF %>% 
   group_by(Fruit) %>%
   group_map(~ bind_cols(.y, MyVal = myFun(.))) %>%
   bind_rows
# A tibble: 2 × 2
  Fruit  MyVal
  <chr>  <int>
1 Apple  42925
2 Mango 295425
  • Related