I would like to expand the rows (> 10 million rows) in my dataframe based on the dates between the startdate and enddate columns
ID | Color | StartDate | EndDate | Days
1 blue 2022/01/01 2022/01/04 4
2 red 2022/01/01 2022/01/02 2
should result in:
ID | Color | StartDate | EndDate | Days | New_Date
1 blue 2022/01/01 2022/01/04 4 2022/01/01
1 blue 2022/01/01 2022/01/04 4 2022/01/02
1 blue 2022/01/01 2022/01/04 4 2022/01/03
1 blue 2022/01/01 2022/01/04 4 2022/01/04
2 red 2022/01/01 2022/01/02 2 2022/01/01
2 red 2022/01/01 2022/01/02 2 2022/01/02
I thought about mutating rowwise with tidyr but couldn't get any satisfying results.
CodePudding user response:
This is a job for data.table, especially with 10 million rows.
df[rep(ID,Days)][, NewDate:=df[, seq(StartDate,EndDate,1), by=ID]$V1]
Output:
ID Color StartDate EndDate Days NewDate
<num> <char> <Date> <Date> <num> <Date>
1: 1 blue 2022-01-01 2022-01-04 4 2022-01-01
2: 1 blue 2022-01-01 2022-01-04 4 2022-01-02
3: 1 blue 2022-01-01 2022-01-04 4 2022-01-03
4: 1 blue 2022-01-01 2022-01-04 4 2022-01-04
5: 2 red 2022-01-01 2022-01-02 2 2022-01-01
6: 2 red 2022-01-01 2022-01-02 2 2022-01-02
Input:
df <- data.table(
ID = c(1,2),
Color = c("blue", "red"),
StartDate = rep(as.Date("2022/01/01"),2),
EndDate = c(as.Date("2022/01/04"), as.Date("2022/01/02")),
Days = c(4,2)
)
CodePudding user response:
Assume you want to "expand" by ID
, you can first change your "Date" columns into class Date
, then use seq()
to generate a sequence of dates. Since This method cannot append rows to your dataframe, therefore it's stored in a list
. Finally unnest
the list into rows.
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(across(ends_with("Date"), ~as.Date(.)),
newDate = list(seq(StartDate, EndDate, by = "day"))) %>%
unnest(cols = newDate)
# A tibble: 6 x 6
# Groups: ID [2]
ID Color StartDate EndDate Days newDate
<int> <chr> <date> <date> <int> <date>
1 1 blue 2022-01-01 2022-01-04 4 2022-01-01
2 1 blue 2022-01-01 2022-01-04 4 2022-01-02
3 1 blue 2022-01-01 2022-01-04 4 2022-01-03
4 1 blue 2022-01-01 2022-01-04 4 2022-01-04
5 2 red 2022-01-01 2022-01-02 2 2022-01-01
6 2 red 2022-01-01 2022-01-02 2 2022-01-02