Home > Software engineering >  Parsing the string "823 PM" to the time 20:23 in R like with Python datetime
Parsing the string "823 PM" to the time 20:23 in R like with Python datetime

Time:11-29

I have date time strings like this "25 November, 2021 802 PM". To you and me, that 2 minutes past 8pm. Python's datetime has no problem either.

>>> import datetime
>>> datetime.datetime.strptime("25 November, 2021 802 PM", "%d %B, %Y %I%M %p")
datetime.datetime(2021, 11, 25, 20, 2)

I need to do this in R.

I tried this:

> lubridate::parse_date_time("25 November, 2021 802 PM", 
                             orders = '%d %B, %Y %I%M %p', 
                             exact = T, truncated = T)
[1] NA
Warning message:
 1 failed to parse. 

And just to show that my lubridate code was not bad, here's an alternate that does return a result:

> lubridate::parse_date_time("25 November, 2021 102 PM", orders = '%d %B, %Y %I%M %p', exact = T, truncated = T)
[1] "2021-11-25 22:02:00 UTC"

As we can see, 102 is being split into 10 (the hour) and 2 (the minute), and instead it should be 1 (the hour) and 02 (the minute).

Same happens with R's strptime.

> strptime("25 November, 2021 102 PM", format = '%d %B, %Y %I%M %p')
[1] "2021-11-25 22:02:00 GMT"

Is there a way to get R to parse e.g. 823 PM as 20:23, and 123 PM as 13:23, etc?

correction: It seems Python's datetime does not work generally either. It works as required in the example above, but will not work as required in other examples, e.g.

>>> import datetime
>>> datetime.datetime.strptime("25 November, 2021 102 PM", "%d %B, %Y %I%M %p")
datetime.datetime(2021, 11, 25, 22, 2)

CodePudding user response:

The key is placing a colon separator between the hours and minutes.
In base R this can be done with:

as.POSIXct(sub('([0-9]{2} [AP]M)', ':\\1', datetimes), format = '%d %B, %Y %I:%M %p')

CodePudding user response:

The time part has to be in 0102 (HHMM) format to work with %I%M, e.g.

str_12
[1] "25 November, 2021 102 PM"

frmt_time <- function(str){
  str_tmp <- unlist(strsplit(str," "))
  str_n <- paste0(
    c(str_tmp[1:3], 
    ifelse(str_tmp[4]<4,
      paste0(0,str_tmp[4]), 
      str_tmp[4] ),
    str_tmp[5]),
  collapse=" " )

  as.POSIXct(str_n, format="%d %B, %Y %I%M %p")
}

frmt_time(str_12)
[1] "2021-11-25 13:02:00 CET"

CodePudding user response:

Given that all my date-time strings have a common form, and given that, as answered by @Andre Wildberg, there's no way something like 102 is going to be parsed to 1:02, but if we do 1-02 or 1:02, we will be fine, then here's my solution. The main thing is to place a colon before the last two digits in 3 or 4 digit hour-min string:

library(tidyverse)
library(lubridate)
library(stringr)

c("26 November, 2021 136 PM", 
  "26 November, 2021 415 AM", 
  "26 November, 2021 1209 PM",
  "22 November, 2021 211 PM",
  "25 November, 2021 732 PM") %>% 
  str_replace("(.*), ([0-9]{4}) ([12]*[0-9])([0-5][0-9]) (.*)", 
              "\\1 \\2, \\3:\\4 \\5") %>% 
  parse_date_time("d m y, H:M %p")
#> [1] "2021-11-26 13:36:00 UTC" "2021-11-26 04:15:00 UTC"
#> [3] "2021-11-26 12:09:00 UTC" "2021-11-22 14:11:00 UTC"
#> [5] "2021-11-25 19:32:00 UTC"
  • Related