i have what seems like a pretty simple question, but i haven't been able to successfully adapt solutions to similar ones to my situation, including this one: Add row for each group with missing value
i have some data that looks like this:
# A tibble: 265 x 4
anon_ID assistance_date Benefit_1 nth_assistance_interaction
<int> <chr> <chr> <dbl>
1 8 2020-04-10 Medical 5
2 8 2020-04-13 Medical 10
3 8 2020-04-15 Medical 15
4 8 2020-04-21 Medical 20
5 11 2020-06-17 Housing 5
6 11 2020-06-25 Financial 10
7 11 2021-01-27 Financial 15
8 26 2020-05-18 Legal 5
9 26 2021-06-01 Food 10
10 26 2021-08-02 Utilities 15
# ... with 255 more rows
i want to modify it so that each anon_ID
has four observations, one for each unique value of nth_assistance_interaction
. the values of assistance_date
and Benefit_1
should be NA
when real values for these variables don't exist.
e.g., for anon_ID
= 11
, these two variables would have NA
values when nth_assistance_interaction
= 20
.
# A tibble: 265 x 4
anon_ID assistance_date Benefit_1 nth_assistance_interaction
<int> <chr> <chr> <dbl>
1 8 2020-04-10 Medical 5
2 8 2020-04-13 Medical 10
3 8 2020-04-15 Medical 15
4 8 2020-04-21 Medical 20
5 11 2020-06-17 Housing 5
6 11 2020-06-25 Financial 10
7 11 2021-01-27 Financial 15
8 11 NA NA 20
9 26 2020-05-18 Legal 5
10 26 2021-06-01 Food 10
11 26 2021-08-02 Utilities 15
# ... with 255 more rows
this is just one example of what i'm trying to accomplish. it could also be the case that anon_ID
= 27
only has one observation for nth_assistance_interaction
, and so i would need to add three rows for them.
how can i go about making this happen? thanks in advance.
CodePudding user response:
We may group by 'anon_ID' and use complete
to expand the data
library(dplyr)
library(tidyr)
df1 %>
group_by(anon_ID) %>%
complete(nth_assistance_interaction = c(5, 10, 15, 20)) %>%
ungroup