Home > Mobile >  Why does my custom function not work while mutating a tibble?
Why does my custom function not work while mutating a tibble?

Time:04-13

I created an custom function to add working days to a date. The function depends on the following packages:

library(tidyverse)
library(lubridate)
library(tidyquant)

This is the function I created:

add_workingdays <- function(start_date, number_of_days, switch_count_weekendsholidays = TRUE,  remove_weekends = TRUE, holidays = NULL){
  start_date <- start_date %>% as.Date()
  if (!is.Date(start_date)) stop("add_workingdays(): start_date must be a date.", call. = FALSE)
  target_date <- start_date   number_of_days
  if(switch_count_weekendsholidays){
    target_date_lenght <- tidyquant::WORKDAY_SEQUENCE(start_date, target_date, remove_weekends, holidays = holidays) %>% length()
    while(target_date_lenght != number_of_days) {
      target_date <- target_date   1
      target_date_lenght <- tidyquant::WORKDAY_SEQUENCE(start_date, target_date, remove_weekends, holidays = holidays) %>% length()
    }
  }
  target_date %>% return()
}

When I run the function in the following scenario, it works without problems.

add_workingdays(start_date = '2022-04-08' %>% as.Date(), number_of_days = 5)
[1] "2022-04-14"
'2022-04-08' %>% as.Date() %>% add_workingdays(number_of_days = 5)
[1] "2022-04-14"

But when I try to use it within a mutate function in a tibble, I get error messages I do not understand.

I use the following code and it gives the error at the end:

tibble(
    dates = rep('2022-04-08' %>% as.Date()), #) seq.Date(from = '2022-04-08' %>% as.Date(), by = 'days', length.out = 5),
    days_to_add = rep(10:5)
  ) %>% 
    print() %>% 
    mutate(
      target_date = add_workingdays(start_date = dates, number_of_days = days_to_add)
    )
# A tibble: 6 x 2
  dates      days_to_add
  <date>           <int>
1 2022-04-08          10
2 2022-04-08           9
3 2022-04-08           8
4 2022-04-08           7
5 2022-04-08           6
6 2022-04-08           5
Error in `mutate()`:
! Problem while computing `target_date =
  add_workingdays(start_date = dates, number_of_days =
  days_to_add)`.
Caused by error in `seq.Date()`:
! 'from' must be of length 1
Run `rlang::last_error()` to see where the error occurred.

Can anyone explain to me what I do wrong when using this custom function within a mutate function?

CodePudding user response:

Your function must be either vectorized using Vectorize or applied to each element individually using purrr::map or lapply:

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(tidyquant)
#> Loading required package: PerformanceAnalytics
#> Loading required package: xts
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> 
#> Attaching package: 'xts'
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, last
#> 
#> Attaching package: 'PerformanceAnalytics'
#> The following object is masked from 'package:graphics':
#> 
#>     legend
#> Loading required package: quantmod
#> Loading required package: TTR
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo
#> ══ Need to Learn tidyquant? ════════════════════════════════════════════════════
#> Business Science offers a 1-hour course - Learning Lab #9: Performance Analysis & Portfolio Optimization with tidyquant!
#> </> Learn more at: https://university.business-science.io/p/learning-labs-pro </>

add_workingdays <- function(start_date, number_of_days, switch_count_weekendsholidays = TRUE, remove_weekends = TRUE, holidays = NULL) {
  start_date <- start_date %>% as.Date()
  if (!is.Date(start_date)) stop("add_workingdays(): start_date must be a date.", call. = FALSE)
  target_date <- start_date   number_of_days
  if (switch_count_weekendsholidays) {
    target_date_lenght <- tidyquant::WORKDAY_SEQUENCE(start_date, target_date, remove_weekends, holidays = holidays) %>% length()
    while (target_date_lenght != number_of_days) {
      target_date <- target_date   1
      target_date_lenght <- tidyquant::WORKDAY_SEQUENCE(start_date, target_date, remove_weekends, holidays = holidays) %>% length()
    }
  }
  target_date %>% return()
}

add_workingdays(start_date = "2022-04-08" %>% as.Date(), number_of_days = 5)
#> [1] "2022-04-14"


c("2022-04-08", "2022-04-09") %>%
  as.Date() %>%
  add_workingdays(number_of_days = 5)
#> Error in seq.Date(from = AS_DATE(start_date), to = AS_DATE(end_date), : 'from' must be of length 1

add_workingdays <- Vectorize(add_workingdays)
c("2022-04-08", "2022-04-09") %>%
  as.Date() %>%
  add_workingdays(number_of_days = 5)
#> [1] 19096 19097

tibble(
  dates = rep("2022-04-08" %>% as.Date()), # ) seq.Date(from = '2022-04-08' %>% as.Date(), by = 'days', length.out = 5),
  days_to_add = rep(10:5)
) %>%
  mutate(
    target_date = add_workingdays(start_date = dates, number_of_days = days_to_add)
  )
#> # A tibble: 6 × 3
#>   dates      days_to_add target_date
#>   <date>           <int>       <dbl>
#> 1 2022-04-08          10       19103
#> 2 2022-04-08           9       19102
#> 3 2022-04-08           8       19101
#> 4 2022-04-08           7       19100
#> 5 2022-04-08           6       19097
#> 6 2022-04-08           5       19096

Created on 2022-04-13 by the reprex package (v2.0.0)

CodePudding user response:

The error has nothing to do with mutate. It comes from the improper input for the argument in the tidyquant::WORKDAY_SEQUENCE function. If you check this function by printing it to the console, you get this:

tidyquant::WORKDAY_SEQUENCE #with no parens

#function (start_date, end_date, remove_weekends = TRUE, holidays = NULL) 
#{
#    day_sequence <- DATE_SEQUENCE(start_date, end_date, by = "day")
#    ret_tbl <- tibble::tibble(day_sequence = day_sequence) %>% 
#    ...#and more

which shows that this function uses DATE_SEQUENCE function. Likewise, if you check this function, you'll find that it uses seq.Date function, which requires a single date for the from argument. For example:

seq.Date(from = as.Date("2020-01-01"), to = as.Date("2020-01-03"), by = 'days')
#[1] "2020-01-01" "2020-01-02" "2020-01-03"
WORKDAY_SEQUENCE("2020-01-01", "2020-01-03")
#[1] "2020-01-01" "2020-01-02" "2020-01-03"

If you entry more than one date data to from, you get the same error:

seq.Date(from = c(as.Date("2020-01-01"), as.Date("2020-01-02")), to = as.Date("2020-01-03"), by = 'days')
#Error in seq.Date(c(as.Date("2020-01-01"), as.Date("2020-01-02")), #as.Date("2020-01-03"),  : 
#  'from' must be of length 1

WORKDAY_SEQUENCE(c("2020-01-01", "2020-01-02"), "2020-01-03")
#Error in seq.Date(from = AS_DATE(start_date), to = AS_DATE(end_date),  : 
#  'from' must be of length 1

To avoid this error, when you want to entry a column for start_date, you have to vectorize the WORKDAY_SEQUENCE function, which means you apply the function to each date in the column. As @danloo shows, you can use Vectorize. You can also use *apply family, as follows:

lapply(c("2020-01-01", "2020-01-02"), WORKDAY_SEQUENCE, end_date = "2020-01-03")
#[[1]]
#[1] "2020-01-01" "2020-01-02" "2020-01-03"

#[[2]]
#[1] "2020-01-02" "2020-01-03"

You also need to vectorize length() function. Because lapply returns a list, if you need to return a vector so that it can be used to create a column in a data frame, you can use unlist():

c("2020-01-01", "2020-01-02") %>% 
      lapply(WORKDAY_SEQUENCE, end_date = "2020-01-03") %>% 
      lapply(length) %>% 
      unlist()
#[1] 3 2

Hence, if you'd like to follow the lapply method above, the relevant line in your function should be revised from

target_date_lenght <- tidyquant::WORKDAY_SEQUENCE(start_date,
                      target_date, remove_weekends, 
                      holidays = holidays) %>% 
                      length()

to

target_date_lenght <- start_date %>% 
                       lapply(tidyquant::WORKDAY_SEQUENCE, 
                              end_date = target_date,
                              remove_weekends = remove_weekends, 
                              holidays = holidays) %>% 
                       lapply(length) %>% 
                       unlist()
  • Related