Home > Mobile >  Is there a faster way for false dropping grouping variables in R using dplyr package?
Is there a faster way for false dropping grouping variables in R using dplyr package?

Time:11-17

I have a data frame that looks like this :


library(tidyverse)
date = seq(as.Date("2022/1/1"), by = "day", length.out = 16)
value = c(1.1,1.2,1.3,1.4,1.7,1.4,1.9,1.89,2,2.1,2.2,2.15,3,3.1,3.09,2.08)
variable = c(rep("a",4),rep("b",4),rep("c",4),rep("d",4))
df = tibble(date,value,variable);df


# A tibble: 16 × 3
   date       value variable
   <date>     <dbl> <chr>   
 1 2022-01-01  1.1  a       
 2 2022-01-02  1.2  a       
 3 2022-01-03  1.3  a       
 4 2022-01-04  1.4  a       
 5 2022-01-05  1.7  b       
 6 2022-01-06  1.4  b       
 7 2022-01-07  1.9  b       
 8 2022-01-08  1.89 b       
 9 2022-01-09  2    c       
10 2022-01-10  2.1  c       
11 2022-01-11  2.2  c       
12 2022-01-12  2.15 c       
13 2022-01-13  3    d       
14 2022-01-14  3.1  d       
15 2022-01-15  3.09 d       
16 2022-01-16  2.08 d   

I want to summarize the max and the min of the value column grouped by the variable column and match the corresponding date of these stats.

Doing so (using tidyverse package) I did :

df%>%group_by(variable)%>%
  summarise(MAX = max(value),MIN=min(value))%>%
  pivot_longer(!variable, names_to = "stats", values_to = "value")%>%
  left_join(.,df,by=c("value","variable"))



# A tibble: 8 × 4
  variable stats value date      
  <chr>    <chr> <dbl> <date>    
1 a        MAX    1.4  2022-01-04
2 a        MIN    1.1  2022-01-01
3 b        MAX    1.9  2022-01-07
4 b        MIN    1.4  2022-01-06
5 c        MAX    2.2  2022-01-11
6 c        MIN    2    2022-01-09
7 d        MAX    3.1  2022-01-14
8 d        MIN    2.08 2022-01-16

but I wonder is there a faster way for doing that matching for dates stats ?

CodePudding user response:

Identifying the min and max rows with the very efficient which.min and which.max function and keeping those rows with slice will be more efficient than summarizing and joining. (Should also be more efficient than reordering the whole data frame.)

df %>%
  group_by(variable) %>%
  slice(c(which.min(value), which.max(value))) %>%
  mutate(stat = c("MIN", "MAX")) %>%
  ungroup()
# # A tibble: 8 × 4
#   date       value variable stat 
#   <date>     <dbl> <chr>    <chr>
# 1 2022-01-01  1.1  a        MIN  
# 2 2022-01-04  1.4  a        MAX  
# 3 2022-01-06  1.4  b        MIN  
# 4 2022-01-07  1.9  b        MAX  
# 5 2022-01-09  2    c        MIN  
# 6 2022-01-11  2.2  c        MAX  
# 7 2022-01-16  2.08 d        MIN  
# 8 2022-01-14  3.1  d        MAX  
  • Related