R datetime series missing values-CodePudding

I am converting from chr to POSIXct formatted as "%Y-%m-%d %H:%M:%S however, some of the times in the dataset do not have the seconds portion (%S) so when I convert to DateTime the times without the seconds are returned as empty cells - NA How do I make sure this does not happen. I want them all returned as DateTime regardless of if some time portions are missing or not?

The date-times that are in this format ("%Y-%m-%d %H:%M:%S")are returned correctly as POSIXct

But the date-times that are in this format ("%Y-%m-%d %H:%M") are returned as NA

This is the code is used for the conversion trips$ended_at <- as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S") It is a huge dataset with over a million entries so i don't even know which datetimes don't have the seconds portion. Is there a way that those without the seconds' portion can just have zeroes and the end? for example, 2020-29-04 01:57 will be returned as 2020-29-04 01:57:00 when converted to POSIXct

Please help!

CodePudding user response：

Here's what I usually do. Check the length of the character strings prior to POSIXct formatting via nchar(), store the results in a new column, for example:

trips$check<-nchar(trips$ended_at)

Then check that all trips$ended_at have the same length and add the missing seconds for those who don't:

trips$ended_at_new<-ifelse(trips$check==19,trips$ended_at,paste(trips$ended_at,":00",sep=""))

You can swap out the 19 for whatever datetime format you are using. Important note: This only works, if seconds are missing at the end of your timestamp, not if the timestamp is less than 19 characters long for any other reason.

CodePudding user response：

This is the approach I took, using ifelse() under the assumption that you are dealing with two possibilities - with and without seconds

date_time <- c("2020-01-18 20:12:16", "2020-01-18 20:12")

ifelse(nchar(date_time) == 16, 
       format(as.POSIXct(date_time, format="%Y-%m-%d %H:%M"), "%Y-%m-%d %H:%M:%S"), 
       format(as.POSIXct(date_time, format="%Y-%m-%d %H:%M:%S"), "%Y-%m-%d %H:%M:%S"))

CodePudding user response：

Two methods:

Concatenate the literal :00 onto the end of timestamps that only have hour/minute:

as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S")
# [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT"
# [3] "2020-04-29 01:57:00 EDT" NA                       
# [5] "2020-04-29 01:57:00 EDT"

gsub("( [0-9] :[0-9] )$", "\\1:00", trips$ended_at)
# [1] "2020-04-29 01:57:00" "2020-04-29 01:57:00" "2020-04-29 01:57:00"
# [4] "2020-04-29 01:57:00" "2020-04-29 01:57:00"

as.POSIXct(gsub("( [0-9] :[0-9] )$", "\\1:00", trips$ended_at), format = "%Y-%m-%d %H:%M:%S")
# [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT"
# [3] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT"
# [5] "2020-04-29 01:57:00 EDT"

If you have multiple "candidate" formats that you need to try, you can iteratively step through them. This loop iterates through the formats, place the most-likely candidates first. If at any point all timestamps have been converted, it early-exits the for loop.

candidates <- c("%Y-%m-%d %H:%M", "%d/%m/%Y %H:%M:%S", "%d/%m/%Y %H:%M")
out <- as.POSIXct(trips$ended_at, format = "%Y-%m-%d %H:%M:%S")
for (fmt in candidates) {
  if (!length(isna <- is.na(out))) break
  out[isna] <- as.POSIXct(trips$ended_at[isna], format = fmt)
}
out
# [1] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT"
# [3] "2020-04-29 01:57:00 EDT" "2020-04-29 01:57:00 EDT"
# [5] "2020-04-29 01:57:00 EDT"

Data

trips <- data.frame(ended_at = c("2020-04-29 01:57:00", "2020-04-29 01:57:00", "2020-04-29 01:57:00", "2020-04-29 01:57", "2020-04-29 01:57:00"))