I am converting from chr to POSIXct formatted as "%Y-%m-%d %H:%M:%S however, some of the times in the dataset do not have the seconds portion (%S) so when I convert to DateTime the times without the seconds are returned as empty cells - NA How do I make sure this does not happen. I want them all returned as DateTime regardless of if some time portions are missing or not?
The date-times that are in this format ("%Y-%m-%d %H:%M:%S")are returned correctly as POSIXct
But the date-times that are in this format ("%Y-%m-%d %H:%M") are returned as NA
This is the code is used for the conversion
trips$ended_at <- as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S")
It is a huge dataset with over a million entries so i don't even know which datetimes don't have the seconds portion.
Is there a way that those without the seconds' portion can just have zeroes and the end?
for example, 2020-29-04 01:57 will be returned as 2020-29-04 01:57:00 when converted to POSIXct
Please help!
CodePudding user response:
Here's what I usually do. Check the length of the character strings prior to POSIXct formatting via nchar()
, store the results in a new column, for example:
trips$check<-nchar(trips$ended_at)
Then check that all trips$ended_at
have the same length and add the missing seconds for those who don't:
trips$ended_at_new<-ifelse(trips$check==19,trips$ended_at,paste(trips$ended_at,":00",sep=""))
You can swap out the 19 for whatever datetime format you are using. Important note: This only works, if seconds are missing at the end of your timestamp, not if the timestamp is less than 19 characters long for any other reason.
CodePudding user response:
This is the approach I took, using ifelse()
under the assumption that you are dealing with two possibilities - with and without seconds
date_time <- c("2020-01-18 20:12:16", "2020-01-18 20:12")
ifelse(nchar(date_time) == 16,
format(as.POSIXct(date_time, format="%Y-%m-%d %H:%M"), "%Y-%m-%d %H:%M:%S"),
format(as.POSIXct(date_time, format="%Y-%m-%d %H:%M:%S"), "%Y-%m-%d %H:%M:%S"))
CodePudding user response:
Two methods:
Concatenate the literal
:00
onto the end of timestamps that only have hour/minute:as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S") # [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT" # [3] "2020-04-29 01:57:00 EDT" NA # [5] "2020-04-29 01:57:00 EDT" gsub("( [0-9] :[0-9] )$", "\\1:00", trips$ended_at) # [1] "2020-04-29 01:57:00" "2020-04-29 01:57:00" "2020-04-29 01:57:00" # [4] "2020-04-29 01:57:00" "2020-04-29 01:57:00" as.POSIXct(gsub("( [0-9] :[0-9] )$", "\\1:00", trips$ended_at), format = "%Y-%m-%d %H:%M:%S") # [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT" # [3] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT" # [5] "2020-04-29 01:57:00 EDT"
If you have multiple "candidate" formats that you need to try, you can iteratively step through them. This loop iterates through the formats, place the most-likely candidates first. If at any point all timestamps have been converted, it early-exits the
for
loop.candidates <- c("%Y-%m-%d %H:%M", "%d/%m/%Y %H:%M:%S", "%d/%m/%Y %H:%M") out <- as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S") for (fmt in candidates) { if (!length(isna <- is.na(out))) break out[isna] <- as.POSIXct(trips$ended_at[isna], format = fmt) } out # [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT" # [3] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT" # [5] "2020-04-29 01:57:00 EDT"
Data
trips <- data.frame(ended_at = c("2020-04-29 01:57:00", "2020-04-29 01:57:00", "2020-04-29 01:57:00", "2020-04-29 01:57", "2020-04-29 01:57:00"))