I have a dataset which include two columns (trip_start_date,trip_end_date). both of these columns were chr datatype so i converted them into dttm using this code:
df[["started_at"]] <- as.POSIXct(df[["started_at"]], format= "%Y-%m-%d %H:%M:%S") %>% ymd_hms()
df[["ended_at"]] <- as.POSIXct(df[["ended_at"]], format= "%Y-%m-%d %H:%M:%S") %>% ymd_hms()
then i try to get the difference between both columns by this code:
df1 <- df %>%
difftime(ended_at,started_at, units = 'mins')
But i receive this error
Error in as.POSIXct.default(time1, tz = tz) :
do not know how to convert 'time1' to class “POSIXct”*
Can I have any tip to solve this issue?
dataframe
head(df)
ride_id rideable_type started_at ended_at start_station_n~ start_station_id end_station_name
<chr> <chr> <dttm> <dttm> <chr> <chr> <chr>
1 CFA86D4455AA1~ classic_bike 2021-03-16 08:32:30 2021-03-16 08:36:34 Humboldt Blvd &~ 15651 Stave St & Armi~
2 30D9DC61227D1~ classic_bike 2021-03-28 01:26:28 2021-03-28 01:36:55 Humboldt Blvd &~ 15651 Central Park Av~
3 846D87A15682A~ classic_bike 2021-03-11 21:17:29 2021-03-11 21:33:53 Shields Ave & 2~ 15443 Halsted St & 35~
4 994D05AA75A16~ classic_bike 2021-03-11 13:26:42 2021-03-11 13:55:41 Winthrop Ave & ~ TA1308000021 Broadway & Sher~
5 DF7464FBE92D8~ classic_bike 2021-03-21 09:09:37 2021-03-21 09:27:33 Glenwood Ave & ~ 525 Chicago Ave & S~
6 CEBA8516FD17F~ classic_bike 2021-03-20 11:08:47 2021-03-20 11:29:39 Glenwood Ave & ~ 525 Chicago Ave & S~
# ... with 6 more variables: end_station_id <chr>, start_lat <dbl>, start_lng <dbl>, end_lat <dbl>, end_lng <dbl>,
# member_casual <chr>
dput(head(df[, c( 3,4)]))
structure(list(started_at = structure(c(1615883550, 1616894788,
1615497449, 1615469202, 1616317777, 1616238527), tzone = "UTC", class = c("POSIXct",
"POSIXt")), ended_at = structure(c(1615883794, 1616895415, 1615498433,
1615470941, 1616318853, 1616239779), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
CodePudding user response:
You are missing a mutate
.
library(dplyr)
df %>%
as_tibble() %>%
mutate(diff = difftime(ended_at, started_at, units = 'mins'))
#> # A tibble: 6 x 3
#> started_at ended_at diff
#> <dttm> <dttm> <drtn>
#> 1 2021-03-16 08:32:30 2021-03-16 08:36:34 4.066667 mins
#> 2 2021-03-28 01:26:28 2021-03-28 01:36:55 10.450000 mins
#> 3 2021-03-11 21:17:29 2021-03-11 21:33:53 16.400000 mins
#> 4 2021-03-11 13:26:42 2021-03-11 13:55:41 28.983333 mins
#> 5 2021-03-21 09:09:37 2021-03-21 09:27:33 17.933333 mins
#> 6 2021-03-20 11:08:47 2021-03-20 11:29:39 20.866667 mins
With the pipe you say "use the result of the left-hand side and insert it as the first argument of the function on the right-hand side". Therefore your initial code would mean:
# your code
df %>% difftime(ended_at, started_at, units = 'mins')
# "unpiped" version of your code which does not make sense as-is
difftime(df, ended_at, started_at, units = 'mins')
# either use mutate as shown above or use the following
difftime(df$ended_at, df$started_at, units = 'mins')