Home > Software design >  Mutating character to date/time in R
Mutating character to date/time in R

Time:10-27

Im new to R and am having an issue converting a character format to a date/time format. Trying to get the started_at and ended_at columns mutated from character to date/time, but no matter what I have tried, I get the error nas introduced by coercion or character string is not in a standard unambiguous format. The intent is to create a new column ride_length as the difference between the ended_at and started_at values in minutes.

I have my df named sep_2021.

str(sep_2021)

spec_tbl_df [804,352 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : chr [1:804352] "9DC7B962304CBFD8" "F930E2C6872D6B32" "6EF72137900BB910" "78D1DE133B3DBF55" ...
 $ rideable_type     : chr [1:804352] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ started_at        : chr [1:804352] "9/28/21 16:07" "9/28/21 14:24" "9/28/21 00:20" "9/28/21 14:51" ...
 $ ended_at          : chr [1:804352] "9/28/21 16:09" "9/28/21 14:40" "9/28/21 00:23" "9/28/21 15:00" ...
 $ day_of_week       : num [1:804352] 3 3 3 3 3 3 3 3 2 3 ...
 $ start_station_name: chr [1:804352] NA NA NA NA ...
 $ start_station_id  : chr [1:804352] NA NA NA NA ...
 $ end_station_name  : chr [1:804352] NA NA NA NA ...
 $ end_station_id    : chr [1:804352] NA NA NA NA ...
 $ start_lat         : num [1:804352] 41.9 41.9 41.8 41.8 41.9 ...
 $ start_lng         : num [1:804352] -87.7 -87.6 -87.7 -87.7 -87.7 ...
 $ end_lat           : num [1:804352] 41.9 42 41.8 41.8 41.9 ...
 $ end_lng           : num [1:804352] -87.7 -87.7 -87.7 -87.7 -87.7 ...
 $ member_casual     : chr [1:804352] "casual" "casual" "casual" "casual" ...
 - attr(*, "spec")=
  .. cols(
  ..   ride_id = col_character(),
  ..   rideable_type = col_character(),
  ..   started_at = col_character(),
  ..   ended_at = col_character(),
  ..   day_of_week = col_double(),
  ..   start_station_name = col_character(),
  ..   start_station_id = col_character(),
  ..   end_station_name = col_character(),
  ..   end_station_id = col_character(),
  ..   start_lat = col_double(),
  ..   start_lng = col_double(),
  ..   end_lat = col_double(),
  ..   end_lng = col_double(),
  ..   member_casual = col_character()
  .. )
 - attr(*, "problems")=<externalptr> 

I have attempted the following --

sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at)

Result: character string is not in a standard unambiguous format

sep_2021 <- mutate(sep_2021, started_at = as.Date.POSIXct(started_at, tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS","%Y/%m/%d %H:%M:%OS")))

Result: character string is not in a standard unambiguous format

sep_2021 <- mutate(sep_2021, started_at = lubridate::as_datetime(started_at))

Result: All formats failed to parse. No formats found

sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at, "%m-%d-%y %H:%M:%OS"))

Result: NAs introduced by coercion

Any and all suggestions or advice is greatly appreciated!

CodePudding user response:

We may use parse_date from parsedate

library(dplyr)
library(parsedate)
sep_2021 <- sep_2021 %>%
    mutate(across(c(started_at, ended_at), parse_date))

The format used and the format in the columns are different i.e. it should be %m/%d/%y %H:%M

sep_2021 <- sep_2021 %>%
      mutate(across(c(started_at, ended_at), as.POSIXct,
      format = "%m/%d/%y %H:%M"))

CodePudding user response:

You may use mdy_hm to change the class from character to POSIXct. To calculate difference use difftime and pass units to it.

For example, to get difference in seconds you can do -

library(dplyr)
library(lubridate)

sep_2021 <- sep_2021 %>%
  mutate(across(c(started_at, ended_at), mdy_hm), 
         diff = difftime(ended_at, started_at, units = 'secs'))

sep_2021

#           started_at            ended_at     diff
#1 2021-09-28 16:07:00 2021-09-28 16:09:00 120 secs
#2 2021-09-28 14:24:00 2021-09-28 14:40:00 960 secs
#3 2021-09-28 00:20:00 2021-09-28 00:23:00 180 secs
#4 2021-09-28 14:51:00 2021-09-28 15:00:00 540 secs

data

It is easier to help if you provide data in a reproducible format

sep_2021 <- data.frame(started_at = c("9/28/21 16:07", "9/28/21 14:24" ,"9/28/21 00:20" ,"9/28/21 14:51"), 
                  ended_at = c("9/28/21 16:09" ,"9/28/21 14:40", "9/28/21 00:23", "9/28/21 15:00"))
  • Related