Convert chr to dttm-CodePudding

I have a df with two columns (started_at,ended_at). I converted them from chr to dttm

df <- df %>% mutate(across(c(started_at, ended_at), ~ as.POSIXct(.x, format = "%Y-%m-%d %H:%M:%S") %>% ymd_hms()))

but later i found that in each column that dates is registered in two formats, for example; one of them is in Year-Month-Day H:M:S format and the other one is in Day-month-Year H:M:S format.


# dummy dataframe example
id <- c(1:4) 
started_at <- c("2020-01-30 11:11:11", "2020-01-29 11:11:11" , "2020-01-28 11:11:11" , "13-01-20 11:11:11" )
ended_at <- c("2020-01-31 11:11:11", "2020-02-02 11:11:11" , "2020-01-29 11:11:11" , "14-01-20 11:11:11") 
bike_type <- c("E","C","E","E")

df <- data.frame(id,started_at,ended_at,bike_type, stringsAsFactors = F)

I want a method to change both columns into a ymd_hms format. Thanks in advance.

CodePudding user response：

Package lubridate has a function parse_date_time that can parse many formmats in one call. But in the code below the formats must be in the right order, if the formats' order is swapped the results are wrong.

suppressPackageStartupMessages({
  library(dplyr)
  library(lubridate)
})

id <- c(1:4) 
started_at <- c("2020-01-30 11:11:11", "2020-01-29 11:11:11" , "2020-01-28 11:11:11" , "13-01-20 11:11:11" )
ended_at <- c("2020-01-31 11:11:11", "2020-02-02 11:11:11" , "2020-01-29 11:11:11" , "14-01-20 11:11:11") 
bike_type <- c("E","C","E","E")

df <- data.frame(id,started_at,ended_at,bike_type, stringsAsFactors = F)

df %>% mutate(across(c(started_at, ended_at), ~ parse_date_time(.x, orders = c("dmy HMS", "ymd HMS"))))
#>   id          started_at            ended_at bike_type
#> 1  1 2020-01-30 11:11:11 2020-01-31 11:11:11         E
#> 2  2 2020-01-29 11:11:11 2020-02-02 11:11:11         C
#> 3  3 2020-01-28 11:11:11 2020-01-29 11:11:11         E
#> 4  4 2020-01-13 11:11:11 2020-01-14 11:11:11         E

^{Created on 2022-04-03 by the reprex package (v2.0.1)}

Or simpler, using the dots argument to across.

df %>% mutate(across(c(started_at, ended_at), parse_date_time, orders = c("dmy HMS", "ymd HMS")))

CodePudding user response：

Since you have dates in different orders in the same columns, we can create new columns with both orders (i.e., dmy and ymd). The dmy will just return NA for those that are not in that order (i.e., that are in ymd). Next, we convert to ymd, then we can combine (via coalesce) these 2 columns together each for the start and end dates. Then, just select the original column names.

library(tidyverse)
library(lubridate)

df %>%
  mutate(across(c(started_at, ended_at), ~ dmy_hms(.x), .names = "{.col}_dmy"),
         across(c(started_at, ended_at), ~ ymd_hms(.x), .names = "{.col}_ymd"),
         started_at = coalesce(started_at_dmy, started_at_ymd),
         ended_at = coalesce(ended_at_dmy, ended_at_ymd)) %>% 
  select(names(df))

Output

  id          started_at            ended_at bike_type
1  1 2020-01-30 11:11:11 2020-01-31 11:11:11         E
2  2 2020-01-29 11:11:11 2020-02-02 11:11:11         C
3  3 2020-01-28 11:11:11 2020-01-29 11:11:11         E
4  4 2020-01-13 11:11:11 2020-01-14 11:11:11         E

However, if the last row is supposed to be 2013 and 2014 instead of 2020, then you can just use lubridate.

df %>%
    mutate(across(c(started_at, ended_at), ~ ymd_hms(.x)))

Output

  id          started_at            ended_at bike_type
1  1 2020-01-30 11:11:11 2020-01-31 11:11:11         E
2  2 2020-01-29 11:11:11 2020-02-02 11:11:11         C
3  3 2020-01-28 11:11:11 2020-01-29 11:11:11         E
4  4 2013-01-20 11:11:11 2014-01-20 11:11:11         E

Data

df <- structure(list(id = 1:4, started_at = c("2020-01-30 11:11:11", 
"2020-01-29 11:11:11", "2020-01-28 11:11:11", "13-01-20 11:11:11"
), ended_at = c("2020-01-31 11:11:11", "2020-02-02 11:11:11", 
"2020-01-29 11:11:11", "14-01-20 11:11:11"), bike_type = c("E", 
"C", "E", "E")), class = "data.frame", row.names = c(NA, -4L))