I have date time strings like this "25 November, 2021 802 PM". To you and me, that 2 minutes past 8pm. Python's datetime
has no problem either.
>>> import datetime
>>> datetime.datetime.strptime("25 November, 2021 802 PM", "%d %B, %Y %I%M %p")
datetime.datetime(2021, 11, 25, 20, 2)
I need to do this in R.
I tried this:
> lubridate::parse_date_time("25 November, 2021 802 PM",
orders = '%d %B, %Y %I%M %p',
exact = T, truncated = T)
[1] NA
Warning message:
1 failed to parse.
And just to show that my lubridate code was not bad, here's an alternate that does return a result:
> lubridate::parse_date_time("25 November, 2021 102 PM", orders = '%d %B, %Y %I%M %p', exact = T, truncated = T)
[1] "2021-11-25 22:02:00 UTC"
As we can see, 102
is being split into 10
(the hour) and 2
(the minute), and instead it should be 1
(the hour) and 02
(the minute).
Same happens with R's strptime
.
> strptime("25 November, 2021 102 PM", format = '%d %B, %Y %I%M %p')
[1] "2021-11-25 22:02:00 GMT"
Is there a way to get R to parse e.g. 823 PM
as 20:23, and 123 PM
as 13:23, etc?
correction: It seems Python's datetime
does not work generally either. It works as required in the example above, but will not work as required in other examples, e.g.
>>> import datetime
>>> datetime.datetime.strptime("25 November, 2021 102 PM", "%d %B, %Y %I%M %p")
datetime.datetime(2021, 11, 25, 22, 2)
CodePudding user response:
The key is placing a colon separator between the hours and minutes.
In base R this can be done with:
as.POSIXct(sub('([0-9]{2} [AP]M)', ':\\1', datetimes), format = '%d %B, %Y %I:%M %p')
CodePudding user response:
The time part has to be in 0102 (HHMM) format to work with %I%M
, e.g.
str_12
[1] "25 November, 2021 102 PM"
frmt_time <- function(str){
str_tmp <- unlist(strsplit(str," "))
str_n <- paste0(
c(str_tmp[1:3],
ifelse(str_tmp[4]<4,
paste0(0,str_tmp[4]),
str_tmp[4] ),
str_tmp[5]),
collapse=" " )
as.POSIXct(str_n, format="%d %B, %Y %I%M %p")
}
frmt_time(str_12)
[1] "2021-11-25 13:02:00 CET"
CodePudding user response:
Given that all my date-time strings have a common form, and given that, as answered by @Andre Wildberg, there's no way something like 102
is going to be parsed to 1:02, but if we do 1-02
or 1:02
, we will be fine, then here's my solution. The main thing is to place a colon before the last two digits in 3 or 4 digit hour-min string:
library(tidyverse)
library(lubridate)
library(stringr)
c("26 November, 2021 136 PM",
"26 November, 2021 415 AM",
"26 November, 2021 1209 PM",
"22 November, 2021 211 PM",
"25 November, 2021 732 PM") %>%
str_replace("(.*), ([0-9]{4}) ([12]*[0-9])([0-5][0-9]) (.*)",
"\\1 \\2, \\3:\\4 \\5") %>%
parse_date_time("d m y, H:M %p")
#> [1] "2021-11-26 13:36:00 UTC" "2021-11-26 04:15:00 UTC"
#> [3] "2021-11-26 12:09:00 UTC" "2021-11-22 14:11:00 UTC"
#> [5] "2021-11-25 19:32:00 UTC"