I have a data frame containing dates and associated values for three groups as follows.
library(lubridate)
library(magrittr)
library(dplyr)
data <-
data.frame(group = rep(c("a", "b", "c"), each = 5),
date = rep(seq(ymd(20200101), ymd(20200105), by = 1),
times = 3),
value = runif(15))
I would like to perform a grouped filter operation in order to extract subsets of the data by date, where the start date varies by group.
start_date <- list(a = ymd(20200102), b = ymd(20200104), c = ymd(20200103))
I'd like to use the group name to index into the start date list. I try to do the operation as follows, but I get an error message.
data %>%
group_by(group) %>%
filter(date >= start_date[[group]])
Error in app$vspace(new_style$`margin-top` %||% 0) :
attempt to apply non-function
What am I doing wrong?
(The alternative to the above procedure is to set the start dates up as a data frame, join the data frame to data
, and perform an ungrouped filter. This works as intended, but it's less elegant (in my opinion) and I'd prefer to keep the start dates as a list for other reasons.)
start_date_2 <-
data.frame(group = c("a", "b", "c"),
start_date = c(ymd(20200102), ymd(20200104), c = ymd(20200103)))
data %>%
left_join(start_date_2, by = "group") %>%
filter(date >= start_date) %>%
select(-start_date)
CodePudding user response:
We can use purrr
:
library(lubridate)
library(magrittr)
library(dplyr)
library(purrr)
data %>%
group_split(group) %>%
map2(., start_date, ~ filter(.x , date >= .y)) %>%
bind_rows()
#> # A tibble: 9 x 3
#> group date value
#> <fct> <date> <dbl>
#> 1 a 2020-01-02 0.0756
#> 2 a 2020-01-03 0.0535
#> 3 a 2020-01-04 0.338
#> 4 a 2020-01-05 0.332
#> 5 b 2020-01-04 0.992
#> 6 b 2020-01-05 0.356
#> 7 c 2020-01-03 0.751
#> 8 c 2020-01-04 0.949
#> 9 c 2020-01-05 0.778
CodePudding user response:
We could use a non-equi join
library(data.table)
setDT(data)[data.table(date = do.call(c, start_date),
group = names(start_date)), on = .(group, date >= date)]
group date value
<char> <Date> <num>
1: a 2020-01-02 0.003896343
2: a 2020-01-02 0.052198616
3: a 2020-01-02 0.866560180
4: a 2020-01-02 0.576245169
5: b 2020-01-04 0.531409336
6: b 2020-01-04 0.383936672
7: c 2020-01-03 0.041919498
8: c 2020-01-03 0.363742695
9: c 2020-01-03 0.856596967
CodePudding user response:
We can access the name of the current group(s) by using cur_group()
. This gives us a tibble with every group being a column containing the current group name, so we need to subset it with $group
.
library(dplyr)
data %>%
group_by(group) %>%
filter(date >= start_date[[cur_group()$group]])
#> # A tibble: 9 x 3
#> # Groups: group [3]
#> group date value
#> <chr> <date> <dbl>
#> 1 a 2020-01-02 0.225
#> 2 a 2020-01-03 0.345
#> 3 a 2020-01-04 0.110
#> 4 a 2020-01-05 0.0951
#> 5 b 2020-01-04 0.356
#> 6 b 2020-01-05 0.345
#> 7 c 2020-01-03 0.0973
#> 8 c 2020-01-04 0.344
#> 9 c 2020-01-05 0.418
Created on 2022-03-29 by the reprex package (v0.3.0)
CodePudding user response:
Another option could be:
data %>%
filter(date >= start_date[match(group, names(start_date))])
group date value
1 a 2020-01-02 0.29910439
2 a 2020-01-03 0.08603731
3 a 2020-01-04 0.78913465
4 a 2020-01-05 0.40025848
5 b 2020-01-04 0.87568202
6 b 2020-01-05 0.21065155
7 c 2020-01-03 0.47403618
8 c 2020-01-04 0.74996630
9 c 2020-01-05 0.08386366