Home > front end >  Formatting 24-hour time variable to capture observations in different ranges
Formatting 24-hour time variable to capture observations in different ranges

Time:09-17

I currently have a data frame with a column for Start.Time (imported from a *.csv file), and the format is in 24 hour format (e.g., 20:00:00 equals 8pm). My goal is to capture observations with a start time in various intervals (e.g., between 9:00:00 and 10:00:00), which also meet other criteria. However, it seems that R sorts this 'character' variable in a way that does not align with how our day goes (e.g., 14:00:00 is considered a lower value than 9:00:00).

For example, below is a line of code that works as intended, where I am capturing observations on two different trail segments, which had a start time between 8:00:00 and 9:00:00.

RLLtoMist8.9<-sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) & 
                     (dataset1$Start.Time>="8:00" & dataset1$Start.Time < "9:00"),
                  na.rm=TRUE)
RLLtoMist8.9

But, this code below does not work as intended, as R is 'valuing' 9:00:00 as greater than 10:00:00.

RLLtoMist9.10 <-
       sum((dataset1$Trail.Segment==52|dataset1$Trail.Segment==55) & 
           (dataset1$Start.Time>="9:00:00 AM" & dataset1$Start.Time < "10:00:00 AM"),
            na.rm=TRUE)

CodePudding user response:

Using the data.table library:

# convert to data table 
dataset1<-data.table(dataset1)

# format to a date format rather that character
dataset1[, Start.Time := as.POSIXct(Start.Time, format="%H:%M:%S")]

#now do your filtering
dataset1[between(Start.Time, as.POSIXct("09:00:00", format="%H:%M:%S"), as.POSIXct("10:00:00", format="%H:%M:%S")) & (Trail.Segment==52 | Trail.Segment==55)]

CodePudding user response:

It's certainly true that character types are sorted so that "14:00" is less than "9:00". However R has a datetime class which would sort times correctly once a character representation has been parsed.

a <- as.POSIXct("14:00", format="%H:%M")
b <- as.POSIXct("8:00", format="%H:%M")
# test
> a < b
[1] FALSE

You would be able to convert an entire column with:

dataset1$Start.Time <- as.POSIXct(dataset1$Start.Time,  format="%H:%M")

The date of a and b were the system date at the time of conversion. There are packages, such as chron, that let you use just times, but POSIXt objects have dates and times necessarily. See ?DateTimeClasses. The lubridate package also has an 'interval' class and there exist a difftime function in base-R.

  • Related