Home > Software engineering >  Pivot longer with start and end dates
Pivot longer with start and end dates

Time:10-14

I sincerely hope this question has been asked before and if so I would be delighted to be put towards existing threads.

I have data that basically contains information about several time periods marked by their respective start and end dates. I want to transform this data so that every date between these start and end dates is its own observation with all the data associated with the time frame.

So far my best guess using a for-loop for every row and basically creating a new data.frame for each observation and stitching all of them back together. But, I realise this cannot be an even moderately efficient way to do so, and since the datasets I am working with tend to be rather large, I would appreciate a more performative method.

I was hoping to find a way along the lines of the tidyverse pivot_longer, since it seems vaguely connected in my opinion.

The original data looks similar to this example:

 data.frame(Location = c("Berlin", "Amsterdam"),
                            Value = c("A", "B"),
                            Starting = c("2001-01-01", "2001-01-08"),
                            Ending = c("2001-01-03", "2001-01-10"))

   Location Value   Starting     Ending
1    Berlin     A 2001-01-01 2001-01-03
2 Amsterdam     B 2001-01-08 2001-01-10

The desired output would look something like this ideally:

data.frame(Location = rep(c("Berlin", "Amsterdam"), each = 3),
           Date = c("2001-01-01", "2001-01-02", "2001-01-03", "2001-01-08", "2001-01-09", "2001-01-10"),
           Value = rep(c("A", "B"), each = 3))

   Location       Date Value
1    Berlin 2001-01-01     A
2    Berlin 2001-01-02     A
3    Berlin 2001-01-03     A
4 Amsterdam 2001-01-08     B
5 Amsterdam 2001-01-09     B
6 Amsterdam 2001-01-10     B

CodePudding user response:

We could unnest after creating a list column with seq

library(dplyr)
library(purrr)
library(tidyr)
df1 %>% 
  transmute(Location, Value, Date = map2(as.Date(Starting), 
       as.Date(Ending), seq, by = "1 day")) %>%
  unnest(Date)

-output

# A tibble: 6 × 3
  Location  Value Date      
  <chr>     <chr> <date>    
1 Berlin    A     2001-01-01
2 Berlin    A     2001-01-02
3 Berlin    A     2001-01-03
4 Amsterdam B     2001-01-08
5 Amsterdam B     2001-01-09
6 Amsterdam B     2001-01-10
  • Related