Home > OS >  Use "distinct" "group_by" "summarise" two times in one pipe
Use "distinct" "group_by" "summarise" two times in one pipe

Time:06-28

I want to do something like

df1 <- iris %>% distinct(Species, .keep_all = TRUE) %>% group_by(Petal.Width) %>% summarise(Sepal.Length.mean1=mean(Sepal.Length), .groups = "drop")

df2 <- iris %>% distinct(Species, Petal.Width, .keep_all = TRUE) %>% group_by(Petal.Width) %>% summarise(Sepal.Length.mean2 =mean(Sepal.Length), .groups = "drop")

inner_join(df1, df2, by="Petal.Width") 

But this is tedious to read because of the repetition. Is it possible to do all in one pipe? I cannot recover the initial dataset after distinct() so I wonder if there's a replacement to that.

CodePudding user response:

A possible solution is to create first a function and then use it inside pipes:

library(tidyverse)

f <- function(df = iris, var1 = Species, var2 = Petal.Width, 
        var3 = Sepal.Length, i)
{
  x <- enquo(var3)
  
  {{df}} %>% 
    distinct({{var1}}, .keep_all = TRUE) %>% group_by({{var2}}) %>%
      summarise(!!str_c(quo_name(x), ".mean", i , sep = "") := mean({{var3}}),
       .groups = "drop")
}

inner_join(f(i = 1), f(i = 2), by="Petal.Width")

#> # A tibble: 3 × 3
#>   Petal.Width Sepal.Length.mean1 Sepal.Length.mean2
#>         <dbl>              <dbl>              <dbl>
#> 1         0.2                5.1                5.1
#> 2         1.4                7                  7  
#> 3         2.5                6.3                6.3

CodePudding user response:

A workaround would be to use an expression with {}

Here is the beginning of the solution

iris %>% {
  df1 <- distinct(., Species, .keep_all = TRUE) 
  df2 <- distinct(., Species, Petal.Width, .keep_all = TRUE)
  list(df1, df2)} %>% 
  map(~ group_by(.x, Petal.Width)) # SOLUTION TO BE COMPLETED
  • Related