Home > database >  Fill missing values of dates using the number of the weekday
Fill missing values of dates using the number of the weekday

Time:02-05

I have a dataset in long format. Every subject in the dataset was observed five times during the week. I have a column with the number of the day of the week in which the observation was supposed to happen/happened and another column with the actual dates of the observations. The latter column has some missing values. I would like to use the information on the first column to fill the missing values in the second column. Here is a toy dataset:

df <- data.frame(case = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                 day = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
                 date = as.Date(c("2023-01-02", "2023-01-03", NA, NA, "2023-01-06",
                                  NA, "2021-05-11", "2021-05-12", "2021-05-13", NA)))
df
#  case day       date
#     1   1 2023-01-02
#     1   2 2023-01-03
#     1   3       <NA>
#     1   4       <NA>
#     1   5 2023-01-06
#     2   1       <NA>
#     2   2 2021-05-11
#     2   3 2021-05-12
#     2   4 2021-05-13
#     2   5       <NA>

And here is the desired output:

#   case day       date
#1     1   1 2023-01-02
#2     1   2 2023-01-03
#3     1   3 2023-01-04
#4     1   4 2023-01-05
#5     1   5 2023-01-06
#6     2   1 2021-05-10
#7     2   2 2021-05-11
#8     2   3 2021-05-12
#9     2   4 2021-05-13
#10    2   5 2021-05-14

CodePudding user response:

Does this work for you? No linear models are used.

library(tidyverse)
df2 <- 
  df %>%
  mutate(
    ref_date = case_when(
      case == 1 ~ as.Date("2023-01-01"),
      case == 2 ~ as.Date("2021-05-09")
    ),
    date2 = as.Date(day, origin = ref_date)
  )

Output:

> df2
   case day       date   ref_date      date2
1     1   1 2023-01-02 2023-01-01 2023-01-02
2     1   2 2023-01-03 2023-01-01 2023-01-03
3     1   3       <NA> 2023-01-01 2023-01-04
4     1   4       <NA> 2023-01-01 2023-01-05
5     1   5 2023-01-06 2023-01-01 2023-01-06
6     2   1       <NA> 2021-05-09 2021-05-10
7     2   2 2021-05-11 2021-05-09 2021-05-11
8     2   3 2021-05-12 2021-05-09 2021-05-12
9     2   4 2021-05-13 2021-05-09 2021-05-13
10    2   5       <NA> 2021-05-09 2021-05-14

I concede that G.G.'s answer has the advantage that you don't need to hardcode the reference date.

P.S. here is a pure tidyverse solution without any hardcoding:

df2 <- 
  df %>%
  mutate(ref_date = date - day) %>%
  group_by(case) %>%
  fill(ref_date, .direction = "downup") %>%
  ungroup() %>%
  mutate(date2 = as.Date(day, origin = ref_date))

CodePudding user response:

1) Convert case to factor and then use predict with lm to fill in the NA's. No packages are used.

within(df, {
  case <- factor(case)
  date <- .Date(predict(lm(date ~ case/day), data.frame(case, date)))
})

giving

   case day       date
1     1   1 2023-01-02
2     1   2 2023-01-03
3     1   3 2023-01-04
4     1   4 2023-01-05
5     1   5 2023-01-06
6     2   1 2021-05-10
7     2   2 2021-05-11
8     2   3 2021-05-12
9     2   4 2021-05-13
10    2   5 2021-05-14

2) Find the mean day and date and then use day to appropriately offset each row.

library(dplyr)  # version 1.1.0 or later
df %>%
  mutate(date = {
    Mean <- Map(mean, na.omit(pick(date, day)))
    Mean$date   day - Mean$day
  }, .by = case)
  • Related