I want to impute missing weeks record with 0 values in duration column for each household, individual combination.
The minimum week here is w51 of 2021 and goes upto w4 of 2022
For Household 1001 - individual 1 combination, week 3 is missing in the sequence.
Household 1002 - individual 2, week 52,week 2 and week 4 is missing
Final dataset would be:
what I tried is using complete function from tidyr after group by with household and individual but its not working.
In actual dataset minimium and maximum weeks will be changing.
Here is the sample dataset
data <- data.frame(household=c(1001,1001,1001,1001,1001,1002,1002,1002,1003,1003,1003),
individual = c(1,1,1,1,1,2,2,2,1,1,1),
year = c(2021,2021,2022,2022,2022,2021,2022,2022,2022,2022,2022),
week =c("w51","w52","w1","w2","w4","w51","w1","w3","w1","w2","w3"),
duration =c(20,23,24,56,78,12,34,67,87,89,90))
CodePudding user response:
Using the examples on the ?complete
help page, you can use nesting()
to give you what you want
data %>%
complete(nesting(household, individual), nesting(year, week), fill=list(duration=0))
# household individual year week duration
# <dbl> <dbl> <dbl> <chr> <dbl>
# 1 1001 1 2021 w51 20
# 2 1001 1 2021 w52 23
# 3 1001 1 2022 w1 24
# 4 1001 1 2022 w2 56
# 5 1001 1 2022 w3 0
# 6 1001 1 2022 w4 78
# 7 1002 2 2021 w51 12
# 8 1002 2 2021 w52 0
# 9 1002 2 2022 w1 34
# 10 1002 2 2022 w2 0
# 11 1002 2 2022 w3 67
# 12 1002 2 2022 w4 0
# 13 1003 1 2021 w51 0
# 14 1003 1 2021 w52 0
# 15 1003 1 2022 w1 87
# 16 1003 1 2022 w2 89
# 17 1003 1 2022 w3 90
# 18 1003 1 2022 w4 0