Im new to R and am having an issue converting a character format to a date/time format. Trying to get the started_at
and ended_at
columns mutated from character to date/time, but no matter what I have tried, I get the error nas introduced by coercion
or character string is not in a standard unambiguous format
. The intent is to create a new column ride_length
as the difference between the ended_at
and started_at
values in minutes.
I have my df
named sep_2021
.
str(sep_2021)
spec_tbl_df [804,352 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:804352] "9DC7B962304CBFD8" "F930E2C6872D6B32" "6EF72137900BB910" "78D1DE133B3DBF55" ...
$ rideable_type : chr [1:804352] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : chr [1:804352] "9/28/21 16:07" "9/28/21 14:24" "9/28/21 00:20" "9/28/21 14:51" ...
$ ended_at : chr [1:804352] "9/28/21 16:09" "9/28/21 14:40" "9/28/21 00:23" "9/28/21 15:00" ...
$ day_of_week : num [1:804352] 3 3 3 3 3 3 3 3 2 3 ...
$ start_station_name: chr [1:804352] NA NA NA NA ...
$ start_station_id : chr [1:804352] NA NA NA NA ...
$ end_station_name : chr [1:804352] NA NA NA NA ...
$ end_station_id : chr [1:804352] NA NA NA NA ...
$ start_lat : num [1:804352] 41.9 41.9 41.8 41.8 41.9 ...
$ start_lng : num [1:804352] -87.7 -87.6 -87.7 -87.7 -87.7 ...
$ end_lat : num [1:804352] 41.9 42 41.8 41.8 41.9 ...
$ end_lng : num [1:804352] -87.7 -87.7 -87.7 -87.7 -87.7 ...
$ member_casual : chr [1:804352] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_character(),
.. ended_at = col_character(),
.. day_of_week = col_double(),
.. start_station_name = col_character(),
.. start_station_id = col_character(),
.. end_station_name = col_character(),
.. end_station_id = col_character(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
I have attempted the following --
sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at)
Result: character string is not in a standard unambiguous format
sep_2021 <- mutate(sep_2021, started_at = as.Date.POSIXct(started_at, tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS","%Y/%m/%d %H:%M:%OS")))
Result: character string is not in a standard unambiguous format
sep_2021 <- mutate(sep_2021, started_at = lubridate::as_datetime(started_at))
Result: All formats failed to parse. No formats found
sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at, "%m-%d-%y %H:%M:%OS"))
Result: NAs introduced by coercion
Any and all suggestions or advice is greatly appreciated!
CodePudding user response:
We may use parse_date
from parsedate
library(dplyr)
library(parsedate)
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), parse_date))
The format
used and the format in the columns are different i.e. it should be %m/%d/%y %H:%M
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), as.POSIXct,
format = "%m/%d/%y %H:%M"))
CodePudding user response:
You may use mdy_hm
to change the class from character to POSIXct
. To calculate difference use difftime
and pass units
to it.
For example, to get difference in seconds you can do -
library(dplyr)
library(lubridate)
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), mdy_hm),
diff = difftime(ended_at, started_at, units = 'secs'))
sep_2021
# started_at ended_at diff
#1 2021-09-28 16:07:00 2021-09-28 16:09:00 120 secs
#2 2021-09-28 14:24:00 2021-09-28 14:40:00 960 secs
#3 2021-09-28 00:20:00 2021-09-28 00:23:00 180 secs
#4 2021-09-28 14:51:00 2021-09-28 15:00:00 540 secs
data
It is easier to help if you provide data in a reproducible format
sep_2021 <- data.frame(started_at = c("9/28/21 16:07", "9/28/21 14:24" ,"9/28/21 00:20" ,"9/28/21 14:51"),
ended_at = c("9/28/21 16:09" ,"9/28/21 14:40", "9/28/21 00:23", "9/28/21 15:00"))