Home > Mobile >  Lubridate hour() does not function with times derived from parse_date_time()
Lubridate hour() does not function with times derived from parse_date_time()

Time:12-21

I do not understand why an time which is derived from the function parse_date_time is not usable by another function in lubridate(). This produces a df that has the dates with am/pm parsed correctly.

  dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM", 
                              "11/25/19 12:00:00 AM", 
                              "11/25/19 06:00:00 AM", 
                              "11/25/19 12:00:00 PM", 
                              "11/25/19 06:00:00 PM", 
                              "11/26/19 12:00:00 AM"), 
              'date' = c(1:6), 'time' = c(1:6)) %>% 
mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
     date = date(date_time), 
     time = strftime(date_time,"%H:%M:%S", tz = "UTC"))

When I try to extract the hour from the hour column I get errors:

  dt2 <- dt2 %>% mutate(hour_from_hour = hour(time))

Error: Problem with mutate() column hour_from_hour. i hour_from_hour = hour(time). x character string is not in a standard unambiguous format

But when I use the the original variable "date_time" it works fine.

  dt2 <- dt2 %>% mutate(hour_from_date_time = hour(date_time))

My data sets have variable headers (some are in date time, some are already parsed). It would be nice if I could use hour() on the time column.

CodePudding user response:

If I understood your question correctly this code answers it. It first extracts the two digits for the hour as a character string and then converts them to an integer. The code assumes leading zeros and no leading spaces. The regular expression needs to be edited if cases with different formatting are to be handled. The solution is rather simple once one finds which functions to use, but it is not trivial, I think.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)

dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM", 
                                  "11/25/19 12:00:00 AM", 
                                  "11/25/19 06:00:00 AM", 
                                  "11/25/19 12:00:00 PM", 
                                  "11/25/19 06:00:00 PM", 
                                  "11/26/19 12:00:00 AM"), 
                  'date' = c(1:6), 'time' = c(1:6)) %>% 
  mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
         date = date(date_time), 
         time = strftime(date_time,"%H:%M:%S", tz = "UTC"))

# hour is of mode character, assuming that TZ is always UTC
dt2 <- dt2 %>% mutate(hour_from_hour = as.integer(str_extract(time, "^[0-2][0-9]")),
                      hour_from_date_time = hour(date_time))

identical(dt2$hour_from_hour, dt2$hour_from_date_time)
#> [1] TRUE

dt2
#>             date_time       date     time hour_from_hour hour_from_date_time
#> 1 2019-11-24 18:00:00 2019-11-24 18:00:00             18                  18
#> 2 2019-11-25 00:00:00 2019-11-25 00:00:00              0                   0
#> 3 2019-11-25 06:00:00 2019-11-25 06:00:00              6                   6
#> 4 2019-11-25 12:00:00 2019-11-25 12:00:00             12                  12
#> 5 2019-11-25 18:00:00 2019-11-25 18:00:00             18                  18
#> 6 2019-11-26 00:00:00 2019-11-26 00:00:00              0                   0

Created on 2021-12-21 by the reprex package (v2.0.1)

CodePudding user response:

R doesn't have a native way to handle times that aren't associated to a day. But you can use a package like hms. For example:

library(tidyverse)
library(lubridate)
library(hms)

dt2 <- data.frame('date_time' = c("11/24/19 06:00:00 PM", 
                                  "11/25/19 12:00:00 AM", 
                                  "11/25/19 06:00:00 AM", 
                                  "11/25/19 12:00:00 PM", 
                                  "11/25/19 06:00:00 PM", 
                                  "11/26/19 12:00:00 AM"), 
                  'date' = c(1:6), 'time' = c(1:6)) %>% 
    mutate(date_time = parse_date_time(date_time, orders = "mdy IMS %p"),
           date = date(date_time), 
           time = as_hms(date_time),
           hour = hour(time))

But to be honest, it's probably better to keep the date_time column and use hour directly on it.

  • Related