Hi I am trying to convert the given dates into their respective weekdays, I have a data set of million lines, and I am only using the column with the dates in it.
I am currently using
ifelse(wday(data$started_at)==1,7,wday(data$started_at)-1)
I want Monday to be indicated as 1 and Sunday as 7, however, I do not really care, I would much rather have a faster program.
As trial data you use :
x<- rep("2022-02-01 00:00:04",1000000)
This is what I currently have
ifelse(wday(x)==1,7,wday(x)-1)
I am trying to make it much faster, it currently takes 17second on my computer.
CodePudding user response:
The extra ifelse
and calling wday
twice is not needed here. Curiously, the following is slightly faster, and gives an integer instead of the default numeric from wday
.
x <- sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 1e7, replace = T)
z <- c(7L, 1:6)
bench::mark(
x = wday(x, week_start = 1),
y = z[wday(x)]
)[c(3,5,7,9)]
median mem_alloc n_itr total_time
<bch:tm> <bch:byt> <int> <bch:tm>
1 1s 534MB 1 1s
2 879ms 534MB 1 879ms
CodePudding user response:
There is an argument in lubridate::wday
called week_start
:
x<- "2022-02-01 00:00:04"
wday(x, week_start = 1)
CodePudding user response:
This should be a little faster since I do not use ifelse
. Rather I use this base-R solution which relies on format
:
format(as.Date(x), "%u")
On my laptop this takes about 7.5 seconds of which 5 seconds is only the conversion to Date
format. So if your data is already in date format, this solution should be even quicker.
If in your real data you have a lot of duplicated values (as in your example data) you could possibly speed it up even more: there first apply format
on every unique value and then join the results to your full data.