Home > front end >  R: How to expand date intervals to single dates
R: How to expand date intervals to single dates

Time:09-24

I have a dataset about start and end dates, and I want to expand them to consecutive dates in rows. The dataset looks like this (df1):

id   deg     from             to
1     1      2010-03-01       2010-03-05
1     1      2010-03-20       2010-03-25
1     2      2010-06-01       2010-06-05

And this is the result I want (df2):

id   deg     date            
1     1      2010-03-01       
1     1      2010-03-02       
1     1      2010-03-03
1     1      2010-03-04
1     1      2010-03-05
1     1      2010-03-20 
1     1      2010-03-21 
1     1      2010-03-22 
1     1      2010-03-23
1     1      2010-03-24
1     1      2010-03-25       
1     2      2010-06-01  
1     2      2010-06-02 
1     2      2010-06-03 
1     2      2010-06-04 
1     2      2010-06-05     

Here's the different codes I've tried:

df2 = df1 %>%
  mutate(id= 1:nrow(.)) %>%
  rowwise() %>%
  do(data.frame(id=.$id, date=seq.Date(.$from, .$to, by="days")))

But it keeps showing the error: wrong sign in 'by' argument

Thank you all in advance!

CodePudding user response:

Here is a tidyverse solution:

library(tidyverse)

df %>% 
  rowwise() %>% 
  mutate(new = list(seq(as.Date(from), as.Date(to), by = 1))) %>% 
  select(id, deg, new) %>% 
  unnest()

This gives us:

# A tibble: 16 x 3
      id   deg new       
   <int> <int> <date>    
 1     1     1 2010-03-01
 2     1     1 2010-03-02
 3     1     1 2010-03-03
 4     1     1 2010-03-04
 5     1     1 2010-03-05
 6     1     1 2010-03-20
 7     1     1 2010-03-21
 8     1     1 2010-03-22
 9     1     1 2010-03-23
10     1     1 2010-03-24
11     1     1 2010-03-25
12     1     2 2010-06-01
13     1     2 2010-06-02
14     1     2 2010-06-03
15     1     2 2010-06-04
16     1     2 2010-06-05

Data:

structure(list(id = c(1L, 1L, 1L), deg = c(1L, 1L, 2L), from = structure(c(14669L, 
14688L, 14761L), class = c("IDate", "Date")), to = structure(c(14673L, 
14693L, 14765L), class = c("IDate", "Date"))), row.names = c(NA, 
-3L), .internal.selfref = <pointer: 0x000001e0b8dc1ef0>, class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

We may do this in summarise without the need to use do

library(dplyr)
df1 %>%
    rowwise %>%
    summarise(id, deg = deg, date = seq(from, to, by = 1), .groups = "drop")

-output

# A tibble: 16 x 3
      id   deg date      
   <int> <int> <date>    
 1     1     1 2010-03-01
 2     1     1 2010-03-02
 3     1     1 2010-03-03
 4     1     1 2010-03-04
 5     1     1 2010-03-05
 6     1     1 2010-03-20
 7     1     1 2010-03-21
 8     1     1 2010-03-22
 9     1     1 2010-03-23
10     1     1 2010-03-24
11     1     1 2010-03-25
12     1     2 2010-06-01
13     1     2 2010-06-02
14     1     2 2010-06-03
15     1     2 2010-06-04
16     1     2 2010-06-05

data

df1 <- structure(list(id = c(1L, 1L, 1L), deg = c(1L, 1L, 2L), from = structure(c(14669, 
14688, 14761), class = "Date"), to = structure(c(14673, 14693, 
14765), class = "Date")), row.names = c(NA, -3L), class = "data.frame")
  •  Tags:  
  • r
  • Related