I have a data frame that looks like this :
library(tidyverse)
date = seq(as.Date("2022/1/1"), by = "day", length.out = 16)
value = c(1.1,1.2,1.3,1.4,1.7,1.4,1.9,1.89,2,2.1,2.2,2.15,3,3.1,3.09,2.08)
variable = c(rep("a",4),rep("b",4),rep("c",4),rep("d",4))
df = tibble(date,value,variable);df
# A tibble: 16 × 3
date value variable
<date> <dbl> <chr>
1 2022-01-01 1.1 a
2 2022-01-02 1.2 a
3 2022-01-03 1.3 a
4 2022-01-04 1.4 a
5 2022-01-05 1.7 b
6 2022-01-06 1.4 b
7 2022-01-07 1.9 b
8 2022-01-08 1.89 b
9 2022-01-09 2 c
10 2022-01-10 2.1 c
11 2022-01-11 2.2 c
12 2022-01-12 2.15 c
13 2022-01-13 3 d
14 2022-01-14 3.1 d
15 2022-01-15 3.09 d
16 2022-01-16 2.08 d
I want to summarize the max and the min of the value column grouped by the variable column and match the corresponding date of these stats.
Doing so (using tidyverse package) I did :
df%>%group_by(variable)%>%
summarise(MAX = max(value),MIN=min(value))%>%
pivot_longer(!variable, names_to = "stats", values_to = "value")%>%
left_join(.,df,by=c("value","variable"))
# A tibble: 8 × 4
variable stats value date
<chr> <chr> <dbl> <date>
1 a MAX 1.4 2022-01-04
2 a MIN 1.1 2022-01-01
3 b MAX 1.9 2022-01-07
4 b MIN 1.4 2022-01-06
5 c MAX 2.2 2022-01-11
6 c MIN 2 2022-01-09
7 d MAX 3.1 2022-01-14
8 d MIN 2.08 2022-01-16
but I wonder is there a faster way for doing that matching for dates stats ?
CodePudding user response:
Identifying the min and max rows with the very efficient which.min
and which.max
function and keeping those rows with slice
will be more efficient than summarizing and joining. (Should also be more efficient than reordering the whole data frame.)
df %>%
group_by(variable) %>%
slice(c(which.min(value), which.max(value))) %>%
mutate(stat = c("MIN", "MAX")) %>%
ungroup()
# # A tibble: 8 × 4
# date value variable stat
# <date> <dbl> <chr> <chr>
# 1 2022-01-01 1.1 a MIN
# 2 2022-01-04 1.4 a MAX
# 3 2022-01-06 1.4 b MIN
# 4 2022-01-07 1.9 b MAX
# 5 2022-01-09 2 c MIN
# 6 2022-01-11 2.2 c MAX
# 7 2022-01-16 2.08 d MIN
# 8 2022-01-14 3.1 d MAX