I have a table of cases. Each row represents a case. A case has a Case_id, a start-date, a complete-date and other attributes. A case is active between its start and complete date. Here's a reprex:
library(tidyverse)
set.seed(42)
Case_id <- seq(1:100)
Start <- sample(seq(as.Date("2022-08-01"), as.Date("2022-12-01"), by = "day"), 100, replace = TRUE)
Complete <- Start sample(0:60, 1, replace = TRUE)
Other_attributes <- sample(c("Red", "Blue", "Green"), 100, replace = TRUE)
Cases <- tibble(Case_id, Start, Complete, Other_attributes)
head(Cases)
#> # A tibble: 6 × 4
#> Case_id Start Complete Other_attributes
#> <int> <date> <date> <chr>
#> 1 1 2022-09-18 2022-10-11 Red
#> 2 2 2022-11-09 2022-12-02 Blue
#> 3 3 2022-10-04 2022-10-27 Red
#> 4 4 2022-08-25 2022-09-17 Blue
#> 5 5 2022-10-13 2022-11-05 Green
#> 6 6 2022-11-08 2022-12-01 Red
Created on 2023-01-01 with reprex v2.0.2
Ultimately I want to show, how the number of active cases develops over time, possibly grouped by any combination of other attributes. The development over time is represented by weekly reference-dates. In this example I put the reference-dates into a vector, but they could also be organized into a tibble with one column.
Reference_dates <- seq(as.Date("2022-09-04"), as.Date("2022-12-31"), by = "weeks")
To achieve my ultimate objective, I would first create a long table. I iterate over each element of my weekly reference-dates and filter the cases that are active at each reference-date. The filtering of active cases at the first reference-date can be illustrated like this:
Cases %>% filter(Reference_dates[1] >= Start & Reference_dates[1] <= Complete)
The filtered Cases should also be enriched by a new column consisting of the corresponding reference-date.
I repeat this for each reference-date and then row-bind the filtered and enriched cases into a much longer flat table. My problem is how to create this long table.
I want to avoid for
loops and use functional programming instead to create the long table. I've looked into list columns with subsequent unnesting and purrr::map()
with subsequent purrr::list_rbind()
, but I don't know how to start.
Any ideas?
Once this problem is solved, I could then easily use rpivottable()
or group_by(Reference_dates, Other_attributes)
to count the number of active cases per reference-date and also group it by any combination of other attributes.
My true number of cases is actually much larger and I have several other attributes. Please advise if there's a more efficient way to achieve my ultimate objective than creating the long table first.
CodePudding user response:
Using purrr::map_df
you could loop over your reference dates, create your filtered datasets and bind them into one df in one step like so:
library(tidyverse)
map_df(Reference_dates, function(x) {
Cases %>%
filter(x >= Start & x <= Complete) %>%
mutate(Reference_Date = x)
})
#> # A tibble: 275 × 5
#> Case_id Start Complete Other_attributes Reference_Date
#> <int> <date> <date> <chr> <date>
#> 1 4 2022-08-25 2022-09-17 Blue 2022-09-04
#> 2 7 2022-08-18 2022-09-10 Red 2022-09-04
#> 3 11 2022-08-24 2022-09-16 Green 2022-09-04
#> 4 17 2022-08-20 2022-09-12 Red 2022-09-04
#> 5 18 2022-08-26 2022-09-18 Red 2022-09-04
#> 6 24 2022-08-27 2022-09-19 Red 2022-09-04
#> 7 31 2022-09-03 2022-09-26 Blue 2022-09-04
#> 8 38 2022-08-24 2022-09-16 Blue 2022-09-04
#> 9 39 2022-08-30 2022-09-22 Red 2022-09-04
#> 10 41 2022-08-15 2022-09-07 Red 2022-09-04
#> # … with 265 more rows
CodePudding user response:
Not a purrr
solution, but a dplyr
solution.
library(tidyverse)
columnsToAdd <- as.character(Reference_dates)
Cases[,columnsToAdd] <- ""
Cases |>
pivot_longer(cols = starts_with('2022')) |>
mutate(name = as.Date(name)) |>
select(-value) |>
filter((name >= Start & name <= Complete))
#> # A tibble: 275 × 5
#> Case_id Start Complete Other_attributes name
#> <int> <date> <date> <chr> <date>
#> 1 1 2022-09-18 2022-10-11 Red 2022-09-18
#> 2 1 2022-09-18 2022-10-11 Red 2022-09-25
#> 3 1 2022-09-18 2022-10-11 Red 2022-10-02
#> 4 1 2022-09-18 2022-10-11 Red 2022-10-09
#> 5 2 2022-11-09 2022-12-02 Blue 2022-11-13
#> 6 2 2022-11-09 2022-12-02 Blue 2022-11-20
#> 7 2 2022-11-09 2022-12-02 Blue 2022-11-27
#> 8 3 2022-10-04 2022-10-27 Red 2022-10-09
#> 9 3 2022-10-04 2022-10-27 Red 2022-10-16
#> 10 3 2022-10-04 2022-10-27 Red 2022-10-23
#> # … with 265 more rows