Home > Net >  How to Disaggregate Monthly Data into Weekly Basis?
How to Disaggregate Monthly Data into Weekly Basis?

Time:11-16

I am trying to disaggregate the monthly data and spread them into weekly data in two ways.

  • First, To find the first Monday from the start date and then create days which are Mondays till the last date (month) of the sequence. And then spread the data within the respective week which is in the month.

  • Second, To create a weekly sequence from start date and end date and spread the data within the respective week which is in the month.

The data which I am working with is given below:

structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"), `2022-11-01` = c(100, 
200, 300, 400), `2022-12-01` = c(160, 200, 300, 400), `2023-01-01` = c(500, 
550, 600, 650)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-4L))

And it looks like this:

enter image description here

The expected output 1 is given below, as you can see all the dates are Mondays:

enter image description here

The expected output 2 is given below:

enter image description here

Is this doable, or is it a bit too much to expect from R?

CodePudding user response:

For Mondays we can create a list of Mondays between the dates in the dataframe, join it with the data in long format, count number of the Mondays for each variable in each month, divide the values by the number of Mondays, and revert back the format to wide;

library(dplyr)
library(tidyr)
library(lubridate)
all_dates <- as.Date(names(df1)[-1])

MON <- seq(min(floor_date(all_dates, "month")), 
           max(ceiling_date(all_dates, "month")),
           by="1 day") %>% 
      .[wday(.,label = TRUE) == "Mon"] %>% 
      data.frame("Mondays" = .) %>% 
      mutate(mmm = format(Mondays, "%Y-%m"))
  
df1 %>% 
  pivot_longer(cols = -`Row Labels`, names_to = "dates") %>% 
  mutate(dates = as.Date(dates),
         mmm = format(dates, "%Y-%m"))  %>%
  right_join(MON, by = "mmm") %>% 
  arrange(mmm) %>% 
  group_by(`Row Labels`, dates) %>% 
  mutate(value = value / n()) %>% 
  ungroup() %>% 
  select(`Row Labels`, Mondays, value) %>% 
  pivot_wider(`Row Labels`, names_from = "Mondays", values_from = "value")
#> # A tibble: 4 x 14
#>   `Row Labels` `2022-11-07` `2022-11-14` `2022-11-21` `2022-11-28` `2022-12-05`
#>   <chr>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 X6                     25           25           25           25           40
#> 2 X7                     50           50           50           50           50
#> 3 X8                     75           75           75           75           75
#> 4 X9                    100          100          100          100          100
#> # ... with 8 more variables: 2022-12-12 <dbl>, 2022-12-19 <dbl>,
#> #   2022-12-26 <dbl>, 2023-01-02 <dbl>, 2023-01-09 <dbl>, 2023-01-16 <dbl>,
#> #   2023-01-23 <dbl>, 2023-01-30 <dbl>

Same principal goes to doing it weekly:

WKLY <- seq(min(floor_date(all_dates, "month")), 
            max(ceiling_date(all_dates, "month")), 
            by="week") %>% 
          data.frame("Weekly" = .) %>% 
          mutate(mmm = format(Weekly, "%Y-%m"))

df1 %>% 
  pivot_longer(cols = -`Row Labels`, names_to = "dates") %>% 
  mutate(dates = as.Date(dates),
         mmm = format(dates, "%Y-%m"))  %>%
  right_join(WKLY, by = "mmm") %>% 
  arrange(mmm) %>% 
  group_by(`Row Labels`, dates) %>% 
  mutate(value = value / n()) %>% 
  ungroup() %>% 
  select(`Row Labels`, Weekly, value) %>% 
  pivot_wider(`Row Labels`, names_from = "Weekly", values_from = "value")
#> # A tibble: 4 x 15
#>   `Row Labels` `2022-11-01` `2022-11-08` `2022-11-15` `2022-11-22` `2022-11-29`
#>   <chr>               <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 X6                     20           20           20           20           20
#> 2 X7                     40           40           40           40           40
#> 3 X8                     60           60           60           60           60
#> 4 X9                     80           80           80           80           80
#> # ... with 9 more variables: 2022-12-06 <dbl>, 2022-12-13 <dbl>,
#> #   2022-12-20 <dbl>, 2022-12-27 <dbl>, 2023-01-03 <dbl>, 2023-01-10 <dbl>,
#> #   2023-01-17 <dbl>, 2023-01-24 <dbl>, 2023-01-31 <dbl>

Data:

df1 <- structure(list(`Row Labels` = c("X6", "X7", "X8", "X9"), 
                      `2022-11-01` = c(100, 200, 300, 400), 
                      `2022-12-01` = c(160, 200, 300, 400), 
                      `2023-01-01` = c(500, 550, 600, 650)), 
                 class = c("tbl_df", "tbl", "data.frame"), 
                 row.names = c(NA, -4L))
  • Related