Home > Software design >  fill() in missing lubridate value from a different column
fill() in missing lubridate value from a different column

Time:09-14

Below is a fictional reproducible example of pick-up and drop-of times of four taxis. Taxi 1, 2, and 3 unfortunately have a missing in the drop-of time. fortunately, two of these times (for taxi 1 and 3) can be inferred to be at least 1 sec before they pick-up new costumers (these are non-ride sharing taxi, very corona-proof):

(the below df is - in the real use case - the result of a group_by and summarise of another df)

library(dplyr)

x <- seq(as.POSIXct('2020/01/01'),  # Create sequence of dates
         as.POSIXct('2030/01/01'),
         by = "10 mins") %>% 
  head(20) %>%
  sort()

taxi_nr <- c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4)

drop_of <- x[c(TRUE, FALSE)]
pick_up <- x[c(FALSE, TRUE)]

drop_of[2] <- NA
drop_of[5] <- NA
drop_of[7] <- NA

df <- data.frame(taxi_nr,pick_up,drop_of) %>%
  arrange(pick_up)

I wish to fill in the NA of taxi 1 and 3, I have tried the following:

df <- df %>%
   fill(drop_of, .direction = "up")

However, this take the below drop-of value instead of the below pick-up value and does not take into account the taxi nr.

I have also thought about:

df <- df %>%
  filter(is.na(drop_of)) %>%
  mutate(drop_of, ov[, 1])

This seems to run into problems with the taxi_nr 2 case, as there is no [, 1] in within the group - or so I believe is the issue. I have tried to add safely(), possibly() and quietly(), but that did not help:

df <- df %>%
  filter(is.na(drop_of)) %>%
  mutate(drop_of, purr::safely(ov[, 1]))

Does anyone have a solution?

ps: once I get the right column for filling in it also needs to be subtracted 1 second and be in the right lubridate formate (d/m/y-h/m/s)

THANKS!

CodePudding user response:

You can try to use a temporary variable for it, although it does not look pretty

df <- df %>%
  mutate(temp = ifelse(is.na(drop_of), NA, pick_up)) %>% 
  group_by(taxi_nr) %>% 
  fill(temp, .direction = "up") %>% 
  ungroup() %>% 
  mutate(drop_of = ifelse(is.na(drop_of), temp - 1, drop_of),
         drop_of = as.POSIXct(drop_of, origin = "1970-01-01")) %>% 
  select(-temp)

And if you need your data in a format d/m/y-h/m/s, you could do that with a format() function (I am not sure if what you described is exactly what you need, but at least you should get the idea)

df <- df %>% mutate(drop_of = format(drop_of, "%d/%m/%Y-%H/%M/%S"))
  • Related