I have daily time series as provided in the example here, I need to know how to fill the NA value for only the morning time which is starting from 6:00 AM to 9:00 AM, that gap filling it should be by averaging the residual hours of the same day and so on for the other morning day,


set.seed(3)
df <- data.frame( timestamp = seq(as.POSIXct('2022-01-01', tz='utc'),as.POSIXct('2022-01-10 23:00', tz='utc'), by = '1 hour') ,
                  value = runif(240))
df$value[runif(nrow(df)) < 0.3] <- NA

CodePudding user response：

if I understand you correctly this is one way to solve the task in dplyr:

df %>% 
    dplyr::mutate(after = ifelse(lubridate::hour(timestamp) > 10, value, NA), 
                  day = format(df$timestamp, format = '%Y-%m-%d')) %>% 
    dplyr::group_by(day) %>% 
    dplyr::mutate(value = ifelse(lubridate::hour(timestamp) <10 & is.na(value), mean(after, na.rm = TRUE), value)) %>% 
    dplyr::ungroup() %>% 
    dplyr::select(-after, -day)

# A tibble: 240 x 2
   timestamp           value
   <dttm>              <dbl>
 1 2022-01-01 00:00:00 0.427
 2 2022-01-01 01:00:00 0.808
 3 2022-01-01 02:00:00 0.385
 4 2022-01-01 03:00:00 0.427
 5 2022-01-01 04:00:00 0.602
 6 2022-01-01 05:00:00 0.604
 7 2022-01-01 06:00:00 0.125
 8 2022-01-01 07:00:00 0.295
 9 2022-01-01 08:00:00 0.578
10 2022-01-01 09:00:00 0.631
# ... with 230 more rows
# i Use `print(n = ...)` to see more rows

CodePudding user response：

timestamp value after day
1 2022-01-01 00:00:00 NaN NA 00
2 2022-01-01 01:00:00 0.808 NA 01
3 2022-01-01 02:00:00 0.385 NA 02
4 2022-01-01 03:00:00 NaN NA 03
5 2022-01-01 04:00:00 0.602 NA 04
6 2022-01-01 05:00:00 0.604 NA 05
7 2022-01-01 06:00:00 0.125 NA 06
8 2022-01-01 07:00:00 0.295 NA 07
9 2022-01-01 08:00:00 0.578 NA 08
10 2022-01-01 09:00:00 0.631 NA 09

... with 230 more rows

i Use `print(n = ...)` to see more rows

... with 230 more rows

i Use print(n = ...) to see more rows

i Use `print(n = ...)` to see more rows