Finding Longest Ride in given Dataset-CodePudding

I have one dataset which have four columns Dataset

I have to follow below steps:

drop all rown where pickup_time or dropoff time is missing
find the longest ride(on the basis of duration) for each pickup month(YYYY-MM)
Sort the resulting DataFrame by pickup month

I have tried this:

library(dplyr)
longestRide <- function(df)
{
  df <- na.omit(df)
}

Help me in completing steps 2 and 3

CodePudding user response：

Here's how you can do it with dplyr

library(dplyr)

longest_ride <- df %>%
  filter(!is.na(pickup_datetime), !is.na(dropoff_datetime)) %>%
  group_by(pickup_month = format(pickup_datetime, format="%Y-%m")) %>%
  slice(which.max(dropoff_datetime - pickup_datetime)) %>%
  select(pickup_month, id) %>%
  arrange(pickup_month)

longest_ride

CodePudding user response：

solution using data.table

library(data.table)
setDT(df)

df[order(pickup_datetime), .SD[which.max(dropoff_datetime - pickup_datetime)], by = .(pickup_month = format(pickup_datetime, format="%Y-%m")), .SDcols = 1L]

#    pickup_month      id
# 1:      2016-06 id01234