I have a dataset about start and end dates, and I want to expand them to consecutive dates in rows. The dataset looks like this (df1):
id deg from to
1 1 2010-03-01 2010-03-05
1 1 2010-03-20 2010-03-25
1 2 2010-06-01 2010-06-05
And this is the result I want (df2):
id deg date
1 1 2010-03-01
1 1 2010-03-02
1 1 2010-03-03
1 1 2010-03-04
1 1 2010-03-05
1 1 2010-03-20
1 1 2010-03-21
1 1 2010-03-22
1 1 2010-03-23
1 1 2010-03-24
1 1 2010-03-25
1 2 2010-06-01
1 2 2010-06-02
1 2 2010-06-03
1 2 2010-06-04
1 2 2010-06-05
Here's the different codes I've tried:
df2 = df1 %>%
mutate(id= 1:nrow(.)) %>%
rowwise() %>%
do(data.frame(id=.$id, date=seq.Date(.$from, .$to, by="days")))
But it keeps showing the error: wrong sign in 'by' argument
Thank you all in advance!
CodePudding user response:
Here is a tidyverse
solution:
library(tidyverse)
df %>%
rowwise() %>%
mutate(new = list(seq(as.Date(from), as.Date(to), by = 1))) %>%
select(id, deg, new) %>%
unnest()
This gives us:
# A tibble: 16 x 3
id deg new
<int> <int> <date>
1 1 1 2010-03-01
2 1 1 2010-03-02
3 1 1 2010-03-03
4 1 1 2010-03-04
5 1 1 2010-03-05
6 1 1 2010-03-20
7 1 1 2010-03-21
8 1 1 2010-03-22
9 1 1 2010-03-23
10 1 1 2010-03-24
11 1 1 2010-03-25
12 1 2 2010-06-01
13 1 2 2010-06-02
14 1 2 2010-06-03
15 1 2 2010-06-04
16 1 2 2010-06-05
Data:
structure(list(id = c(1L, 1L, 1L), deg = c(1L, 1L, 2L), from = structure(c(14669L,
14688L, 14761L), class = c("IDate", "Date")), to = structure(c(14673L,
14693L, 14765L), class = c("IDate", "Date"))), row.names = c(NA,
-3L), .internal.selfref = <pointer: 0x000001e0b8dc1ef0>, class = c("tbl_df",
"tbl", "data.frame"))
CodePudding user response:
We may do this in summarise
without the need to use do
library(dplyr)
df1 %>%
rowwise %>%
summarise(id, deg = deg, date = seq(from, to, by = 1), .groups = "drop")
-output
# A tibble: 16 x 3
id deg date
<int> <int> <date>
1 1 1 2010-03-01
2 1 1 2010-03-02
3 1 1 2010-03-03
4 1 1 2010-03-04
5 1 1 2010-03-05
6 1 1 2010-03-20
7 1 1 2010-03-21
8 1 1 2010-03-22
9 1 1 2010-03-23
10 1 1 2010-03-24
11 1 1 2010-03-25
12 1 2 2010-06-01
13 1 2 2010-06-02
14 1 2 2010-06-03
15 1 2 2010-06-04
16 1 2 2010-06-05
data
df1 <- structure(list(id = c(1L, 1L, 1L), deg = c(1L, 1L, 2L), from = structure(c(14669,
14688, 14761), class = "Date"), to = structure(c(14673, 14693,
14765), class = "Date")), row.names = c(NA, -3L), class = "data.frame")