I have a df which has two columns, time
and val
. The df is arranged by time. I want to filter out all the rows from the max value, in this case it is 1.29
. I have provided example below:
library(tidyverse)
library(lubridate)
# This is the entire df
df1 <- tibble::tribble(
~date, ~val,
"2021-09-16 11:02:45", 1.21,
"2021-09-16 11:02:45", 1.21,
"2021-09-16 11:02:45", 1.21,
"2021-09-16 11:02:45", 1.22,
"2021-09-16 11:02:45", 1.22,
"2021-09-16 11:02:45", 1.22,
"2021-09-16 11:02:37", 1.22,
"2021-09-16 10:59:29", 1.29,
"2021-09-16 10:59:14", 1.29,
"2021-09-16 10:59:14", 1.28,
"2021-09-16 10:59:14", 1.28,
"2021-09-16 10:58:17", 1.28,
"2021-09-16 10:58:17", 1.28,
"2021-09-16 10:58:05", 1.26,
"2021-09-16 10:58:05", 1.26,
"2021-09-16 10:58:05", 1.23,
"2021-09-16 10:57:16", 1.23
) %>%
mutate(date = ymd_hms(date))
# This is the outcome I am looking for
tibble::tribble(
~date, ~val,
"2021-09-16 10:59:29", 1.29,
"2021-09-16 10:59:14", 1.29,
"2021-09-16 10:59:14", 1.28,
"2021-09-16 10:59:14", 1.28,
"2021-09-16 10:58:17", 1.28,
"2021-09-16 10:58:17", 1.28,
"2021-09-16 10:58:05", 1.26,
"2021-09-16 10:58:05", 1.26,
"2021-09-16 10:58:05", 1.23,
"2021-09-16 10:57:16", 1.23
) %>%
mutate(date = ymd_hms(date))
How to do this efficiently, any ideas?
CodePudding user response:
If I understood correctly this might solve your problem
library(dplyr)
df1 %>%
filter(date <= first(date[val == max(val)]))
# A tibble: 10 x 2
date val
<dttm> <dbl>
1 2021-09-16 10:59:29 1.29
2 2021-09-16 10:59:14 1.29
3 2021-09-16 10:59:14 1.28
4 2021-09-16 10:59:14 1.28
5 2021-09-16 10:58:17 1.28
6 2021-09-16 10:58:17 1.28
7 2021-09-16 10:58:05 1.26
8 2021-09-16 10:58:05 1.26
9 2021-09-16 10:58:05 1.23
10 2021-09-16 10:57:16 1.23
CodePudding user response:
Here are couple of other dplyr
options using match
.
- Using
slice
-
library(dplyr)
df1 %>% slice(match(max(val), val):n())
# date val
# <dttm> <dbl>
# 1 2021-09-16 10:59:29 1.29
# 2 2021-09-16 10:59:14 1.29
# 3 2021-09-16 10:59:14 1.28
# 4 2021-09-16 10:59:14 1.28
# 5 2021-09-16 10:58:17 1.28
# 6 2021-09-16 10:58:17 1.28
# 7 2021-09-16 10:58:05 1.26
# 8 2021-09-16 10:58:05 1.26
# 9 2021-09-16 10:58:05 1.23
#10 2021-09-16 10:57:16 1.23
- Using
filter
df1 %>% filter(row_number() >= match(max(val), val))
You may also use base R to do the same.
df1[match(max(df1$val), df1$val):nrow(df1), ]
CodePudding user response:
We can use
library(dplyr)
df1 %>%
filter(row_number() >=which.max(val))
-output
# A tibble: 10 x 2
date val
<dttm> <dbl>
1 2021-09-16 10:59:29 1.29
2 2021-09-16 10:59:14 1.29
3 2021-09-16 10:59:14 1.28
4 2021-09-16 10:59:14 1.28
5 2021-09-16 10:58:17 1.28
6 2021-09-16 10:58:17 1.28
7 2021-09-16 10:58:05 1.26
8 2021-09-16 10:58:05 1.26
9 2021-09-16 10:58:05 1.23
10 2021-09-16 10:57:16 1.23
CodePudding user response:
df1 %>%
filter(cumsum(val == max(val)) >= 1)
Here, we keep rows for which the cumulative times we've reached the max value is at least one.
I assume here that the data is sorted by date already.