Home > database >  R a faster weekday method
R a faster weekday method

Time:03-13

Hi I am trying to convert the given dates into their respective weekdays, I have a data set of million lines, and I am only using the column with the dates in it.

I am currently using

ifelse(wday(data$started_at)==1,7,wday(data$started_at)-1)

I want Monday to be indicated as 1 and Sunday as 7, however, I do not really care, I would much rather have a faster program.

As trial data you use :

   x<- rep("2022-02-01 00:00:04",1000000)

This is what I currently have

   ifelse(wday(x)==1,7,wday(x)-1)

I am trying to make it much faster, it currently takes 17second on my computer.

CodePudding user response:

The extra ifelse and calling wday twice is not needed here. Curiously, the following is slightly faster, and gives an integer instead of the default numeric from wday.

x <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1e7, replace = T)

z <- c(7L, 1:6)

bench::mark(
  x = wday(x, week_start = 1),
  y = z[wday(x)]
)[c(3,5,7,9)]
    median mem_alloc n_itr total_time
  <bch:tm> <bch:byt> <int>   <bch:tm>
1       1s     534MB     1         1s
2    879ms     534MB     1      879ms

CodePudding user response:

There is an argument in lubridate::wday called week_start:

x<- "2022-02-01 00:00:04"
wday(x, week_start = 1)

CodePudding user response:

This should be a little faster since I do not use ifelse. Rather I use this base-R solution which relies on format:

format(as.Date(x), "%u")

On my laptop this takes about 7.5 seconds of which 5 seconds is only the conversion to Date format. So if your data is already in date format, this solution should be even quicker.

If in your real data you have a lot of duplicated values (as in your example data) you could possibly speed it up even more: there first apply format on every unique value and then join the results to your full data.

  • Related