Home > Software engineering >  Vlookup in R with time data
Vlookup in R with time data

Time:10-14

I am trying to do a vlookup in r using several ways. I am looking up the value for a specific time.

[df1]

Start_time

18:48:07
18:48:08
18:48:09
18:48:10
18:48:11
18:48:12
...

[df2]

Time_10 min Time_Groupings
0:00        00:00:00>0:09:59
0:10        00:10:00>0:19:59
0:20        00:20:00>0:29:59
0:30        00:30:00>0:39:59
0:40        00:40:00>0:49:59
0:50        00:50:00>0:59:59
1:00        01:00:00>1:09:59
1:10        01:10:00>1:19:59
...

Basically, I want df2 to tell me where "Start_time" is in "Time_grouping" variable and output in df1 a new colum called grouping in which it starts with the time_grouping corresponding to Start_Time, like the "vlookup" function in Excel. So, the output is something like this:

Start_time  Grouping
18:48:07    18:40:00>18:49:59
18:48:08    18:40:00>18:49:59
18:48:09    18:40:00>18:49:59
18:48:10    18:40:00>18:49:59
18:48:11    18:40:00>18:49:59
18:48:12    18:40:00>18:49:59
18:48:13    18:40:00>18:49:59
...

I've tried these functions and no luck so far.

Using findInterval:

df$grouping <- bins$Time_10min[
  findInterval(df$Start_time,
               sapply(strsplit(bins$Time_Groupings, '-'), 
                      function(x) as.numeric(x[1])))]

Using data.table:

setDT(df)[, df := setDT(bins)[df, bins , on = df$Start_time, roll = "nearest"]]

I've not used merge and left_join because I'm under the impression that those are for only one colum to look up, when I have a table (time_10min and time_groupings) in which to look up the "Start_time" variable.

Any help is appreciated. Thanks!

CodePudding user response:

You could convert the time periods to numeric and perform a non-equijoin.

# Example data
df1 <- read.table(text="Start_time
00:18:07
00:28:08
00:38:09
00:48:10
00:48:11
00:48:12",header=T)

df2 <- read.table(text="Time_10_min Time_Groupings
0:00        00:00:00>0:09:59
0:10        00:10:00>0:19:59
0:20        00:20:00>0:29:59
0:30        00:30:00>0:39:59
0:40        00:40:00>0:49:59
0:50        00:50:00>0:59:59
1:00        01:00:00>1:09:59
1:10        01:10:00>1:19:59", header=T)


library(data.table)
library(lubridate)

setDT(df1)
setDT(df2)

df1[,starttime:=as.numeric(lubridate::hms(Start_time))]
df2[,c('start','end'):=transpose(strsplit(df2$Time_Groupings,">"))]
df2[,start:=as.numeric(lubridate::hms(start))]
df2[,end:=as.numeric(lubridate::hms(end))]

df2[df1,.(Time_Groupings,Start_time),on=.(start<=starttime,end>=starttime)]
#>      Time_Groupings Start_time
#>              <char>     <char>
#> 1: 00:10:00>0:19:59   00:18:07
#> 2: 00:20:00>0:29:59   00:28:08
#> 3: 00:30:00>0:39:59   00:38:09
#> 4: 00:40:00>0:49:59   00:48:10
#> 5: 00:40:00>0:49:59   00:48:11
#> 6: 00:40:00>0:49:59   00:48:12

CodePudding user response:

solution without lookup table, straight from df1 to desired output

library(data.table)
library(lubridate)

setDT(df1)

df1[, group_from := duration(10L * hms(Start_time) %/% period(10L, units = "minutes"), units = "minutes")]
df1[, group_to := as.period(group_from   dminutes(10L) - dseconds(1L))]
df1[, group_from := as.period(group_from)]
df1[, .(Start_time, Time_Groupings = gsub(" ", "0", sprintf('d:%-2d:d>d:%-2d:d', group_from@hour, group_from@minute, 0L, group_to@hour, group_to$minute, 59L)))]

#    Start_time    Time_Groupings
# 1:   00:18:07 00:10:00>00:19:59
# 2:   00:28:08 00:20:00>00:29:59
# 3:   00:38:09 00:30:00>00:39:59
# 4:   00:48:10 00:40:00>00:49:59
# 5:   00:48:11 00:40:00>00:49:59
# 6:   00:48:12 00:40:00>00:49:59
  • Related