Below is a fictional reproducible example of pick-up and drop-of times of four taxis. Taxi 1, 2, and 3 unfortunately have a missing in the drop-of time. fortunately, two of these times (for taxi 1 and 3) can be inferred to be at least 1 sec before they pick-up new costumers (these are non-ride sharing taxi, very corona-proof):
(the below df is - in the real use case - the result of a group_by and summarise of another df)
library(dplyr)
x <- seq(as.POSIXct('2020/01/01'), # Create sequence of dates
as.POSIXct('2030/01/01'),
by = "10 mins") %>%
head(20) %>%
sort()
taxi_nr <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4)
drop_of <- x[c(TRUE, FALSE)]
pick_up <- x[c(FALSE, TRUE)]
drop_of[2] <- NA
drop_of[5] <- NA
drop_of[7] <- NA
df <- data.frame(taxi_nr,pick_up,drop_of) %>%
arrange(pick_up)
I wish to fill in the NA of taxi 1 and 3, I have tried the following:
df <- df %>%
fill(drop_of, .direction = "up")
However, this take the below drop-of value instead of the below pick-up value and does not take into account the taxi nr.
I have also thought about:
df <- df %>%
filter(is.na(drop_of)) %>%
mutate(drop_of, ov[, 1])
This seems to run into problems with the taxi_nr 2 case, as there is no [, 1] in within the group - or so I believe is the issue. I have tried to add safely(), possibly() and quietly(), but that did not help:
df <- df %>%
filter(is.na(drop_of)) %>%
mutate(drop_of, purr::safely(ov[, 1]))
Does anyone have a solution?
ps: once I get the right column for filling in it also needs to be subtracted 1 second and be in the right lubridate formate (d/m/y-h/m/s)
THANKS!
CodePudding user response:
You can try to use a temporary variable for it, although it does not look pretty
df <- df %>%
mutate(temp = ifelse(is.na(drop_of), NA, pick_up)) %>%
group_by(taxi_nr) %>%
fill(temp, .direction = "up") %>%
ungroup() %>%
mutate(drop_of = ifelse(is.na(drop_of), temp - 1, drop_of),
drop_of = as.POSIXct(drop_of, origin = "1970-01-01")) %>%
select(-temp)
And if you need your data in a format d/m/y-h/m/s, you could do that with a format()
function (I am not sure if what you described is exactly what you need, but at least you should get the idea)
df <- df %>% mutate(drop_of = format(drop_of, "%d/%m/%Y-%H/%M/%S"))