I have daily time series as provided in the example here, I need to know how to fill the NA value for only the morning time which is starting from 6:00 AM to 9:00 AM, that gap filling it should be by averaging the residual hours of the same day and so on for the other morning day,
set.seed(3)
df <- data.frame( timestamp = seq(as.POSIXct('2022-01-01', tz='utc'),as.POSIXct('2022-01-10 23:00', tz='utc'), by = '1 hour') ,
value = runif(240))
df$value[runif(nrow(df)) < 0.3] <- NA
CodePudding user response:
if I understand you correctly this is one way to solve the task in dplyr:
df %>%
dplyr::mutate(after = ifelse(lubridate::hour(timestamp) > 10, value, NA),
day = format(df$timestamp, format = '%Y-%m-%d')) %>%
dplyr::group_by(day) %>%
dplyr::mutate(value = ifelse(lubridate::hour(timestamp) <10 & is.na(value), mean(after, na.rm = TRUE), value)) %>%
dplyr::ungroup() %>%
dplyr::select(-after, -day)
# A tibble: 240 x 2
timestamp value
<dttm> <dbl>
1 2022-01-01 00:00:00 0.427
2 2022-01-01 01:00:00 0.808
3 2022-01-01 02:00:00 0.385
4 2022-01-01 03:00:00 0.427
5 2022-01-01 04:00:00 0.602
6 2022-01-01 05:00:00 0.604
7 2022-01-01 06:00:00 0.125
8 2022-01-01 07:00:00 0.295
9 2022-01-01 08:00:00 0.578
10 2022-01-01 09:00:00 0.631
# ... with 230 more rows
# i Use `print(n = ...)` to see more rows
CodePudding user response:
timestamp value after day
1 2022-01-01 00:00:00 NaN NA 00
2 2022-01-01 01:00:00 0.808 NA 01
3 2022-01-01 02:00:00 0.385 NA 02
4 2022-01-01 03:00:00 NaN NA 03
5 2022-01-01 04:00:00 0.602 NA 04
6 2022-01-01 05:00:00 0.604 NA 05
7 2022-01-01 06:00:00 0.125 NA 06
8 2022-01-01 07:00:00 0.295 NA 07
9 2022-01-01 08:00:00 0.578 NA 08
10 2022-01-01 09:00:00 0.631 NA 09