I have a sizable .csv file and the data types of the columns need to be changed. They are currently character type. Column "started_at" is below:
$ started_at <chr> "4/26/20 17:45", "4/17/20 17:08", "4/1/20 17:54…
Summary:
started_at
Length:3353381
Class:character
Mode:character
I need to convert this to datetime and have tried: I've tried:
mutate(started_at = ymd_hms(as_datetime(started_at))
and
all_trips4 = all_trips3 %>% mutate( started_at = mdy_hm("%m/%d/%Y% %H:%M"))
and I keep getting warnings such as and I get NAs down the whole column:
Warning: All formats failed to parse. No formats found.
Warning in as.POSIXlt.POSIXct(x, tz) : unknown timezone '%y/%m/%d'
I also need to change this column to numeric:
$ ride_length <chr> "0:27:00", "0:09:00"
Please help.
CodePudding user response:
ymd_hms
isn't going to work unless the format is year, month, day, hour, minute, second. Your examples seem to be month/day/year hour:minute.
So try mdy_hm
instead. Showing just for one value here, but you would use dplyr::mutate
:
library(lubridate)
mdy_hm("4/26/20 17:45")
[1] "2020-04-26 17:45:00 UTC"
The second question: I'm assuming ride_length
is measured in hours, minutes and seconds. So try hms
, then as.numeric
converts to seconds:
as.numeric(hms("0:27:00"))
[1] 1620
EDIT - your comments indicate that you are applying the code incorrectly to your data frame.
To be more explicit:
library(dplyr)
library(lubridate)
df1 <- data.frame(started_at = c("4/26/20 17:45"),
ended_at = c("4/26/20 18:12"),
run_length = c("0:27:00"))
df1
started_at ended_at run_length
1 4/26/20 17:45 4/26/20 18:12 0:27:00
df1 <- df1 %>%
mutate(started_at = mdy_hm(started_at),
ended_at = mdy_hm(ended_at),
run_length = as.numeric(hms(run_length)))
df1
started_at ended_at run_length
1 2020-04-26 17:45:00 2020-04-26 18:12:00 1620