I have a df with two columns (started_at,ended_at). I converted them from chr to dttm
df <- df %>% mutate(across(c(started_at, ended_at), ~ as.POSIXct(.x, format = "%Y-%m-%d %H:%M:%S") %>% ymd_hms()))
but later i found that in each column that dates is registered in two formats, for example; one of them is in Year-Month-Day H:M:S format and the other one is in Day-month-Year H:M:S format.
# dummy dataframe example
id <- c(1:4)
started_at <- c("2020-01-30 11:11:11", "2020-01-29 11:11:11" , "2020-01-28 11:11:11" , "13-01-20 11:11:11" )
ended_at <- c("2020-01-31 11:11:11", "2020-02-02 11:11:11" , "2020-01-29 11:11:11" , "14-01-20 11:11:11")
bike_type <- c("E","C","E","E")
df <- data.frame(id,started_at,ended_at,bike_type, stringsAsFactors = F)
I want a method to change both columns into a ymd_hms format. Thanks in advance.
CodePudding user response:
Package lubridate
has a function parse_date_time
that can parse many formmats in one call. But in the code below the formats must be in the right order, if the formats' order is swapped the results are wrong.
suppressPackageStartupMessages({
library(dplyr)
library(lubridate)
})
id <- c(1:4)
started_at <- c("2020-01-30 11:11:11", "2020-01-29 11:11:11" , "2020-01-28 11:11:11" , "13-01-20 11:11:11" )
ended_at <- c("2020-01-31 11:11:11", "2020-02-02 11:11:11" , "2020-01-29 11:11:11" , "14-01-20 11:11:11")
bike_type <- c("E","C","E","E")
df <- data.frame(id,started_at,ended_at,bike_type, stringsAsFactors = F)
df %>% mutate(across(c(started_at, ended_at), ~ parse_date_time(.x, orders = c("dmy HMS", "ymd HMS"))))
#> id started_at ended_at bike_type
#> 1 1 2020-01-30 11:11:11 2020-01-31 11:11:11 E
#> 2 2 2020-01-29 11:11:11 2020-02-02 11:11:11 C
#> 3 3 2020-01-28 11:11:11 2020-01-29 11:11:11 E
#> 4 4 2020-01-13 11:11:11 2020-01-14 11:11:11 E
Created on 2022-04-03 by the reprex package (v2.0.1)
Or simpler, using the dots argument to across
.
df %>% mutate(across(c(started_at, ended_at), parse_date_time, orders = c("dmy HMS", "ymd HMS")))
CodePudding user response:
Since you have dates in different orders in the same columns, we can create new columns with both orders (i.e., dmy
and ymd
). The dmy
will just return NA
for those that are not in that order (i.e., that are in ymd
). Next, we convert to ymd
, then we can combine (via coalesce
) these 2 columns together each for the start and end dates. Then, just select the original column names.
library(tidyverse)
library(lubridate)
df %>%
mutate(across(c(started_at, ended_at), ~ dmy_hms(.x), .names = "{.col}_dmy"),
across(c(started_at, ended_at), ~ ymd_hms(.x), .names = "{.col}_ymd"),
started_at = coalesce(started_at_dmy, started_at_ymd),
ended_at = coalesce(ended_at_dmy, ended_at_ymd)) %>%
select(names(df))
Output
id started_at ended_at bike_type
1 1 2020-01-30 11:11:11 2020-01-31 11:11:11 E
2 2 2020-01-29 11:11:11 2020-02-02 11:11:11 C
3 3 2020-01-28 11:11:11 2020-01-29 11:11:11 E
4 4 2020-01-13 11:11:11 2020-01-14 11:11:11 E
However, if the last row is supposed to be 2013 and 2014 instead of 2020, then you can just use lubridate
.
df %>%
mutate(across(c(started_at, ended_at), ~ ymd_hms(.x)))
Output
id started_at ended_at bike_type
1 1 2020-01-30 11:11:11 2020-01-31 11:11:11 E
2 2 2020-01-29 11:11:11 2020-02-02 11:11:11 C
3 3 2020-01-28 11:11:11 2020-01-29 11:11:11 E
4 4 2013-01-20 11:11:11 2014-01-20 11:11:11 E
Data
df <- structure(list(id = 1:4, started_at = c("2020-01-30 11:11:11",
"2020-01-29 11:11:11", "2020-01-28 11:11:11", "13-01-20 11:11:11"
), ended_at = c("2020-01-31 11:11:11", "2020-02-02 11:11:11",
"2020-01-29 11:11:11", "14-01-20 11:11:11"), bike_type = c("E",
"C", "E", "E")), class = "data.frame", row.names = c(NA, -4L))