I have one dataset which have four columns Dataset
I have to follow below steps:
- drop all rown where pickup_time or dropoff time is missing
- find the longest ride(on the basis of duration) for each pickup month(YYYY-MM)
- Sort the resulting DataFrame by pickup month
Output should be: Expected Output
I have tried this:
library(dplyr)
longestRide <- function(df)
{
df <- na.omit(df)
}
Help me in completing steps 2 and 3
CodePudding user response:
Here's how you can do it with dplyr
library(dplyr)
longest_ride <- df %>%
filter(!is.na(pickup_datetime), !is.na(dropoff_datetime)) %>%
group_by(pickup_month = format(pickup_datetime, format="%Y-%m")) %>%
slice(which.max(dropoff_datetime - pickup_datetime)) %>%
select(pickup_month, id) %>%
arrange(pickup_month)
longest_ride
CodePudding user response:
solution using data.table
library(data.table)
setDT(df)
df[order(pickup_datetime), .SD[which.max(dropoff_datetime - pickup_datetime)], by = .(pickup_month = format(pickup_datetime, format="%Y-%m")), .SDcols = 1L]
# pickup_month id
# 1: 2016-06 id01234