Home > Enterprise >  Converting date sequences to data frame
Converting date sequences to data frame

Time:12-01

I have a data frame with 9 observations of two different dates. Like:

df <- data.frame(date1 = c("2018-11-01", "2018-10-28", "2019-01-22", "2019-03-22", "2018-10-03", "2018-09-     10","2020-07-01", "2018-03-02", "2018-11-09"), 
                date2 = c("2018-12-31","2018-12-31","2018-12-31","2019-12-31","2018-12-31","2018-12-31","2020-12-31","2018-12-31","2018-12-31"))

For every pair of dates I want to extract the sequence between them by month and write it in a new data frame. For just one pair of observations I use: seq(month(date1), month(date2)) This works nice but not so for date1 and date 2 being a vector > 1. I tried commands like rowwise or tried to loop through the original data frame but nothing worked out.

I tried:

df %>%
  rowwise() %>%
  as.data.frame(df[i,])

or something like:

for(i in 1:nrow(df)){
   as.data.frame(df[i,])
  i = i   1
   }

What I need is a single data frame for every sequence of months for every pair of dates like df1, df2, df3 ... and so on. Every help or idea would be highly appreciated. Thank you.

CodePudding user response:

Since you're using lubridate and dplyr, here's a way using these and the (experimental) group_split:

library(dplyr)
library(lubridate)

df |>
  mutate(across(everything(), ymd)) |>
  group_by(date1, date2) |>
  mutate(new = list(seq(month(date1), month(date2)))) |>
  unnest_longer(new) |>
  group_split(.keep = FALSE)

Output:

[[1]]
# A tibble: 10 × 1
     new
   <int>
 1     3
 2     4
 3     5
 4     6
 5     7
 6     8
 7     9
 8    10
 9    11
10    12

[[2]]
# A tibble: 4 × 1
    new
  <int>
1     9
2    10
3    11
4    12

[[3]]
# A tibble: 3 × 1
    new
  <int>
1    10
2    11
3    12

[[4]]
# A tibble: 3 × 1
    new
  <int>
1    10
2    11
3    12

[[5]]
# A tibble: 2 × 1
    new
  <int>
1    11
2    12

[[6]]
# A tibble: 2 × 1
    new
  <int>
1    11
2    12

[[7]]
# A tibble: 12 × 1
     new
   <int>
 1     1
 2     2
 3     3
 4     4
 5     5
 6     6
 7     7
 8     8
 9     9
10    10
11    11
12    12

[[8]]
# A tibble: 10 × 1
     new
   <int>
 1     3
 2     4
 3     5
 4     6
 5     7
 6     8
 7     9
 8    10
 9    11
10    12

[[9]]
# A tibble: 6 × 1
    new
  <int>
1     7
2     8
3     9
4    10
5    11
6    12

Update: Although I would recommend keeping the data frames in a list, you could add them to the workspace using list2env:

df |>
  mutate(across(everything(), ymd)) |>
  group_by(date1, date2) |>
  mutate(new = list(seq(month(date1), month(date2)))) |>
  unnest_longer(new) |>
  group_split(.keep = FALSE) -> listdf

names(listdf) <- paste0("monthdf", seq(length(listdf)))
list2env(listdf, .GlobalEnv)

CodePudding user response:

You can loop through every row with purrr::pmap, instead of rowwise:

df %>%
  mutate(across(.fns = as.Date)) %>%
  pmap(~ as.Date(..1:..2))

This will return a list, because each sequence has different lengths. If they resulted in the same number of dates, then you could create a dataframe using pmap_dfr or pmap_dfc.

Result:

[[1]]
[1] "2018-11-01" "2018-11-02" "2018-11-03" "2018-11-04" "2018-11-05" ...

[[2]]
[1] "2018-10-28" "2018-10-29" "2018-10-30" "2018-10-31" "2018-11-01" ...

[[3]]
[1] "2019-01-22" "2019-01-21" "2019-01-20" "2019-01-19" "2019-01-18" ...

[[4]]
[1] "2019-03-22" "2019-03-23" "2019-03-24" "2019-03-25" "2019-03-26" ...

[[5]]
[1] "2018-10-03" "2018-10-04" "2018-10-05" "2018-10-06" "2018-10-07" ...

[[6]]
[1] "2018-09-10" "2018-09-11" "2018-09-12" "2018-09-13" "2018-09-14" ...

[[7]]
[1] "2020-07-01" "2020-07-02" "2020-07-03" "2020-07-04" "2020-07-05" ...

[[8]]
[1] "2018-03-02" "2018-03-03" "2018-03-04" "2018-03-05" "2018-03-06" ...

[[9]]
[1] "2018-11-09" "2018-11-10" "2018-11-11" "2018-11-12" "2018-11-13" ...
  • Related