Home > OS >  How to format a column to show time in R
How to format a column to show time in R

Time:02-12

I have a variable that needs to be converted to military time. This variable is very messy because it lacks consistency in the format.

Here is an example of what might be found in the variable.

x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")

So far I had some success in getting everything in a more consistent format with RegEx:

x <- str_extract(x, "[0-9] . [0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]") 
x <- str_remove(x, "[:][0-9][0-9]$")

As you can see I get: "0.9305555555555547", "1520", "1000", "0.125", "1525"

The problem is the decimals need to be multiplied by 2400 to get back to military time but I also don't want to multiply integers (since those are already in military time).

x is essentially a variable in a dataframe.

I was thinking about using if/else logic but I don't know how to implement that.

To clarify, I want:

Input: "0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 15:25"

Output: "2233", "1520", "1000", "0300", "1525"

Thank you!

CodePudding user response:

After the pre-processing you did with your regexes, you can implement if/else logic here by using str_detect()

x <- ifelse(str_detect(x, "\\."),
            as.integer(as.numeric(x) * 2400),
            as.integer(x)) %>% 
  sprintf("d", .)

This will return your desired output as character

Then you could do something like this to parse it to POSIXct

x <- as.POSIXct(x,
         format = "%H%M",
         origin = "1970-01-01",
         tz = "UTC")

CodePudding user response:

I followed your logic literally and got the exact same result.

  1. Numeric conversion
  2. Multiply by 2400 and back to character
  3. for loop to detect dots in the character and delete after that
  4. for loop to put 0 in front of numbers with less than 4 characters
x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")

x <- str_extract(x, "[0-9] . [0-9]|[0-9][0-9][:][0-9][0-9]")
x <- str_remove(x, "[:]") 
x <- str_remove(x, "[:][0-9][0-9]$")

x <- as.numeric(x)

x <- as.character(ifelse(x<1,x*2400,x))

for(i in 1:length(x)){
  
  ii <- stri_locate_first_regex(x[i],"\\.")[1]
  
  if(!is.na(ii)){
    
    x[i] <- str_sub(x[i],1,ii-1)
    
  }
  
}

for(i in 1:length(x)){
  
  while (nchar(x[i])<4) {
    
    x[i] <- paste0("0",x[i])
    
  }
  
}

x
[1] "2233" "1520" "1000" "0300" "1525"
>

CodePudding user response:

Rather than using the same regex on everything right out the gate, I would first determine the format of each element of x, then process each element into a common format, which can then be converted to a time.

I like the hms library for working with times and use it below, but you could also use the base POSIXct or POSIXlt class.

library(tidyverse)
library(hms)

x <- c("0.9305555555555547", "15:20 Found", "10:00:00 AM Found", "0.125", "Found 1525")

# ----
# define functions for parsing each possible format in `x`
parse_decimal <- function(x) as.numeric(str_extract(x, "^0\\.\\d ")) * 2400
parse_24hr <- function(x) {
  out <- str_remove_all(x, "\\D")
  if_else(str_length(out) == 6, str_sub(out, end = 4), out) %>% 
    as.numeric()
}
parse_12hr <- function(x) {
  out <- parse_24hr(x)
  if_else(str_detect(str_to_upper(x), "PM"), out   1200, out)
}  

# ----
# test each element of x, and pass to parsing Fx accordingly
time <- case_when(
  str_detect(x, "^[01]$|^0\\.\\d") ~ parse_decimal(x),
  str_detect(str_to_upper(x), "[AP]M") ~ parse_12hr(x),
  str_detect(x, "\\d{1,2}:?\\d{2}") ~ parse_24hr(x),
  TRUE ~ NA_real_
)

# ----
# results of above will be numbers from 0 to 2359
# use `floor` and modulo (`%%`) to extract # of hours and minutes, 
# then convert to `hms`.
time <- hms(
  hours = floor(time_min / 100), 
  minutes = time_min %% 100
)
  • Related