I would like to transform this data frame:
id | date | X |
---|---|---|
001 | 2022-04-12 | dsf |
001 | 2022-04-13 | df |
001 | 2022-04-14 | sdw |
002 | 2022-04-12 | we3 |
002 | 2022-04-13 | r45 |
002 | 2022-04-14 | 2s |
003 | 2022-04-12 | de65 |
003 | 2022-04-13 | hjk |
004 | 2022-04-12 | vcbdf |
005 | 2022-04-12 | 342f |
to this one:
id | date | X |
---|---|---|
001 | 2022-04-12 | dsf |
001 | 2022-04-13 | df |
001 | 2022-04-14 | sdw |
002 | 2022-04-12 | we3 |
002 | 2022-04-13 | r45 |
002 | 2022-04-14 | 2s |
003 | 2022-04-12 | de65 |
003 | 2022-04-13 | hjk |
003 | 2022-04-14 | NA |
004 | 2022-04-12 | vcbdf |
004 | 2022-04-13 | NA |
004 | 2022-04-14 | NA |
005 | 2022-04-12 | 342f |
005 | 2022-04-13 | NA |
005 | 2022-04-14 | NA |
As you can see, the date target is from 2022-04-12 to 2022-04-14, so the rows without all the dates should be filled.
Any idea?
CodePudding user response:
You can use tidyr::complete
to create a sequence of dates for each id
from the minimum date value to the maximum.
library(dplyr)
library(tidyr)
df %>%
mutate(date = as.Date(date)) %>%
complete(id, date = seq(min(date), max(date), 'days'))
# A tibble: 15 × 3
# id date X
# <int> <date> <chr>
# 1 1 2022-04-12 dsf
# 2 1 2022-04-13 df
# 3 1 2022-04-14 sdw
# 4 2 2022-04-12 we3
# 5 2 2022-04-13 r45
# 6 2 2022-04-14 2s
# 7 3 2022-04-12 de65
# 8 3 2022-04-13 hjk
# 9 3 2022-04-14 NA
#10 4 2022-04-12 vcbdf
#11 4 2022-04-13 NA
#12 4 2022-04-14 NA
#13 5 2022-04-12 342f
#14 5 2022-04-13 NA
#15 5 2022-04-14 NA