I have a variable that needs to be converted to military time. This variable is very messy because it lacks consistency in the format.
Here is an example of what might be found in the variable.
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
So far I had some success in getting everything in a more consistent format with RegEx:
x <- str_extract(x, "[0-9] . [0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]")
x <- str_remove(x, "[:][0-9][0-9]$")
As you can see I get: "0.9305555555555547", "1520", "1000", "0.125", "1525"
The problem is the decimals need to be multiplied by 2400 to get back to military time but I also don't want to multiply integers (since those are already in military time).
x
is essentially a variable in a dataframe.
I was thinking about using if/else logic but I don't know how to implement that.
To clarify, I want:
Input: "0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 15:25"
Output: "2233", "1520", "1000", "0300", "1525"
CodePudding user response:
After the pre-processing you did with your regexes, you can implement if/else logic here by using str_detect()
x <- ifelse(str_detect(x, "\\."),
as.integer(as.numeric(x) * 2400),
as.integer(x)) %>%
sprintf("d", .)
This will return your desired output as character
Then you could do something like this to parse it to POSIXct
x <- as.POSIXct(x,
format = "%H%M",
origin = "1970-01-01",
tz = "UTC")
CodePudding user response:
I followed your logic literally and got the exact same result.
- Numeric conversion
- Multiply by 2400 and back to character
- for loop to detect dots in the character and delete after that
- for loop to put 0 in front of numbers with less than 4 characters
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
x <- str_extract(x, "[0-9] . [0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]")
x <- str_remove(x, "[:][0-9][0-9]$")
x <- as.numeric(x)
x <- as.character(ifelse(x<1,x*2400,x))
for(i in 1:length(x)){
ii <- stri_locate_first_regex(x[i],"\\.")[1]
if(!is.na(ii)){
x[i] <- str_sub(x[i],1,ii-1)
}
}
for(i in 1:length(x)){
while (nchar(x[i])<4) {
x[i] <- paste0("0",x[i])
}
}
x
[1] "2233" "1520" "1000" "0300" "1525"
>
CodePudding user response:
Edit: I realized this:
the decimals need to be multiplied by 2400 to get back to military time
Isn't quite right. "2400" isn't a decimal number (technically it's sexagesimal, base 60), so decimal multiplication won't give a correct result. I've changed my code accordingly.
Rather than using the same regex on everything right out the gate, I would first determine the format of each element of x
, then process the element accordingly.
I like the hms
library for working with times and use it below, but you could also use the base POSIXct
or POSIXlt
class.
library(tidyverse)
library(hms)
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")
# ----
# define functions for parsing each possible format in `x`
parse_decimal <- function(x) {
hrs <- as.numeric(str_extract(x, "^0\\.\\d ")) * 24
min <- (hrs %% 1) * 60
hrs <- floor(hrs)
hms(hours = hrs, minutes = min)
}
parse_timestring <- function(x) {
out <- str_remove_all(x, "\\D")
hr_digits <- if_else(str_length(out) == 3, 1, 2)
hrs <- as.numeric(str_sub(out, end = hr_digits))
hrs <- if_else(str_detect(str_to_upper(x), "P\\.?M"), hrs 12, hrs)
min <- as.numeric(str_sub(out, start = hr_digits 1))
hms(hours = hrs, minutes = min)
}
# ----
# test each element of x, and pass to appropriate parsing Fx
time <- case_when(
str_detect(x, "^[01]$|^0\\.\\d") ~ parse_decimal(x),
str_detect(x, "\\d{1,2}:?\\d{2}") ~ parse_timestring(x),
TRUE ~ NA_real_
)
time
# 22:20:00.000000
# 15:20:00.000000
# 10:00:00.000000
# 03:00:00.000000
# 15:25:00.000000
CodePudding user response:
We should extend the regex for AM/PM indicators so that the forces do not miss each other. Next, in subsets we handle decimal time, imperial time, 24h time, and return
the result.
milt <- function(x) {
u <- trimws(gsub('\\D*(\\d*\\W?[AP]?M?)\\D*', '\\1', x))
u[grep('\\.', u)] <- sprintf('d', round(as.double(u[grep('\\.', u)])*2400))
u[grep('[AP]M', u)] <- strftime(strptime(u[grep('[AP]M', u)], '%I:%M:%S %p'), '%H%M')
u[grep(':', u)] <- gsub(':', '', u[grep(':', u)] )
return(u)
}
milt(x)
# [1] "2233" "1520" "1000" "2200" "0300" "1525" "0000" "1020"
Data:
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found",
"10:00:00 PM Found", "0.125", "Found 1525", "0000", "10:20")