I am trying to do a vlookup in r using several ways. I am looking up the value for a specific time.
[df1]
Start_time
18:48:07
18:48:08
18:48:09
18:48:10
18:48:11
18:48:12
...
[df2]
Time_10 min Time_Groupings
0:00 00:00:00>0:09:59
0:10 00:10:00>0:19:59
0:20 00:20:00>0:29:59
0:30 00:30:00>0:39:59
0:40 00:40:00>0:49:59
0:50 00:50:00>0:59:59
1:00 01:00:00>1:09:59
1:10 01:10:00>1:19:59
...
Basically, I want df2 to tell me where "Start_time" is in "Time_grouping" variable and output in df1 a new colum called grouping in which it starts with the time_grouping corresponding to Start_Time, like the "vlookup" function in Excel. So, the output is something like this:
Start_time Grouping
18:48:07 18:40:00>18:49:59
18:48:08 18:40:00>18:49:59
18:48:09 18:40:00>18:49:59
18:48:10 18:40:00>18:49:59
18:48:11 18:40:00>18:49:59
18:48:12 18:40:00>18:49:59
18:48:13 18:40:00>18:49:59
...
I've tried these functions and no luck so far.
Using findInterval:
df$grouping <- bins$Time_10min[
findInterval(df$Start_time,
sapply(strsplit(bins$Time_Groupings, '-'),
function(x) as.numeric(x[1])))]
Using data.table:
setDT(df)[, df := setDT(bins)[df, bins , on = df$Start_time, roll = "nearest"]]
I've not used merge and left_join because I'm under the impression that those are for only one colum to look up, when I have a table (time_10min and time_groupings) in which to look up the "Start_time" variable.
Any help is appreciated. Thanks!
CodePudding user response:
You could convert the time periods to numeric
and perform a non-equijoin.
# Example data
df1 <- read.table(text="Start_time
00:18:07
00:28:08
00:38:09
00:48:10
00:48:11
00:48:12",header=T)
df2 <- read.table(text="Time_10_min Time_Groupings
0:00 00:00:00>0:09:59
0:10 00:10:00>0:19:59
0:20 00:20:00>0:29:59
0:30 00:30:00>0:39:59
0:40 00:40:00>0:49:59
0:50 00:50:00>0:59:59
1:00 01:00:00>1:09:59
1:10 01:10:00>1:19:59", header=T)
library(data.table)
library(lubridate)
setDT(df1)
setDT(df2)
df1[,starttime:=as.numeric(lubridate::hms(Start_time))]
df2[,c('start','end'):=transpose(strsplit(df2$Time_Groupings,">"))]
df2[,start:=as.numeric(lubridate::hms(start))]
df2[,end:=as.numeric(lubridate::hms(end))]
df2[df1,.(Time_Groupings,Start_time),on=.(start<=starttime,end>=starttime)]
#> Time_Groupings Start_time
#> <char> <char>
#> 1: 00:10:00>0:19:59 00:18:07
#> 2: 00:20:00>0:29:59 00:28:08
#> 3: 00:30:00>0:39:59 00:38:09
#> 4: 00:40:00>0:49:59 00:48:10
#> 5: 00:40:00>0:49:59 00:48:11
#> 6: 00:40:00>0:49:59 00:48:12
CodePudding user response:
solution without lookup table, straight from df1 to desired output
library(data.table)
library(lubridate)
setDT(df1)
df1[, group_from := duration(10L * hms(Start_time) %/% period(10L, units = "minutes"), units = "minutes")]
df1[, group_to := as.period(group_from dminutes(10L) - dseconds(1L))]
df1[, group_from := as.period(group_from)]
df1[, .(Start_time, Time_Groupings = gsub(" ", "0", sprintf('d:%-2d:d>d:%-2d:d', group_from@hour, group_from@minute, 0L, group_to@hour, group_to$minute, 59L)))]
# Start_time Time_Groupings
# 1: 00:18:07 00:10:00>00:19:59
# 2: 00:28:08 00:20:00>00:29:59
# 3: 00:38:09 00:30:00>00:39:59
# 4: 00:48:10 00:40:00>00:49:59
# 5: 00:48:11 00:40:00>00:49:59
# 6: 00:48:12 00:40:00>00:49:59