R a faster weekday method-CodePudding

Hi I am trying to convert the given dates into their respective weekdays, I have a data set of million lines, and I am only using the column with the dates in it.

I am currently using

ifelse(wday(data$started_at)==1,7,wday(data$started_at)-1)

I want Monday to be indicated as 1 and Sunday as 7, however, I do not really care, I would much rather have a faster program.

As trial data you use :

   x<- rep("2022-02-01 00:00:04",1000000)

This is what I currently have

   ifelse(wday(x)==1,7,wday(x)-1)

I am trying to make it much faster, it currently takes 17second on my computer.

CodePudding user response：

The extra ifelse and calling wday twice is not needed here. Curiously, the following is slightly faster, and gives an integer instead of the default numeric from wday.

x <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1e7, replace = T)

z <- c(7L, 1:6)

bench::mark(
  x = wday(x, week_start = 1),
  y = z[wday(x)]
)[c(3,5,7,9)]
    median mem_alloc n_itr total_time
  <bch:tm> <bch:byt> <int>   <bch:tm>
1       1s     534MB     1         1s
2    879ms     534MB     1      879ms

CodePudding user response：

There is an argument in lubridate::wday called week_start:

x<- "2022-02-01 00:00:04"
wday(x, week_start = 1)

CodePudding user response：

This should be a little faster since I do not use ifelse. Rather I use this base-R solution which relies on format:

format(as.Date(x), "%u")

On my laptop this takes about 7.5 seconds of which 5 seconds is only the conversion to Date format. So if your data is already in date format, this solution should be even quicker.

If in your real data you have a lot of duplicated values (as in your example data) you could possibly speed it up even more: there first apply format on every unique value and then join the results to your full data.