I have a dataset in long format. Every subject in the dataset was observed five times during the week. I have a column with the number of the day of the week in which the observation was supposed to happen/happened and another column with the actual dates of the observations. The latter column has some missing values. I would like to use the information on the first column to fill the missing values in the second column. Here is a toy dataset:
df <- data.frame(case = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
day = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5),
date = as.Date(c("2023-01-02", "2023-01-03", NA, NA, "2023-01-06",
NA, "2021-05-11", "2021-05-12", "2021-05-13", NA)))
df
# case day date
# 1 1 2023-01-02
# 1 2 2023-01-03
# 1 3 <NA>
# 1 4 <NA>
# 1 5 2023-01-06
# 2 1 <NA>
# 2 2 2021-05-11
# 2 3 2021-05-12
# 2 4 2021-05-13
# 2 5 <NA>
And here is the desired output:
# case day date
#1 1 1 2023-01-02
#2 1 2 2023-01-03
#3 1 3 2023-01-04
#4 1 4 2023-01-05
#5 1 5 2023-01-06
#6 2 1 2021-05-10
#7 2 2 2021-05-11
#8 2 3 2021-05-12
#9 2 4 2021-05-13
#10 2 5 2021-05-14
CodePudding user response:
Does this work for you? No linear models are used.
library(tidyverse)
df2 <-
df %>%
mutate(
ref_date = case_when(
case == 1 ~ as.Date("2023-01-01"),
case == 2 ~ as.Date("2021-05-09")
),
date2 = as.Date(day, origin = ref_date)
)
Output:
> df2
case day date ref_date date2
1 1 1 2023-01-02 2023-01-01 2023-01-02
2 1 2 2023-01-03 2023-01-01 2023-01-03
3 1 3 <NA> 2023-01-01 2023-01-04
4 1 4 <NA> 2023-01-01 2023-01-05
5 1 5 2023-01-06 2023-01-01 2023-01-06
6 2 1 <NA> 2021-05-09 2021-05-10
7 2 2 2021-05-11 2021-05-09 2021-05-11
8 2 3 2021-05-12 2021-05-09 2021-05-12
9 2 4 2021-05-13 2021-05-09 2021-05-13
10 2 5 <NA> 2021-05-09 2021-05-14
I concede that G.G.'s answer has the advantage that you don't need to hardcode the reference date.
P.S. here is a pure tidyverse solution without any hardcoding:
df2 <-
df %>%
mutate(ref_date = date - day) %>%
group_by(case) %>%
fill(ref_date, .direction = "downup") %>%
ungroup() %>%
mutate(date2 = as.Date(day, origin = ref_date))
CodePudding user response:
1) Convert case to factor and then use predict with lm to fill in the NA's. No packages are used.
within(df, {
case <- factor(case)
date <- .Date(predict(lm(date ~ case/day), data.frame(case, date)))
})
giving
case day date
1 1 1 2023-01-02
2 1 2 2023-01-03
3 1 3 2023-01-04
4 1 4 2023-01-05
5 1 5 2023-01-06
6 2 1 2021-05-10
7 2 2 2021-05-11
8 2 3 2021-05-12
9 2 4 2021-05-13
10 2 5 2021-05-14
2) Find the mean day and date and then use day to appropriately offset each row.
library(dplyr) # version 1.1.0 or later
df %>%
mutate(date = {
Mean <- Map(mean, na.omit(pick(date, day)))
Mean$date day - Mean$day
}, .by = case)