Home > Net >  adding rows by group to get same number of observations by group
adding rows by group to get same number of observations by group

Time:03-05

i have what seems like a pretty simple question, but i haven't been able to successfully adapt solutions to similar ones to my situation, including this one: Add row for each group with missing value

i have some data that looks like this:

# A tibble: 265 x 4
   anon_ID assistance_date Benefit_1 nth_assistance_interaction
     <int> <chr>           <chr>                          <dbl>
 1       8 2020-04-10      Medical                            5
 2       8 2020-04-13      Medical                           10
 3       8 2020-04-15      Medical                           15
 4       8 2020-04-21      Medical                           20
 5      11 2020-06-17      Housing                            5
 6      11 2020-06-25      Financial                         10
 7      11 2021-01-27      Financial                         15
 8      26 2020-05-18      Legal                              5
 9      26 2021-06-01      Food                              10
10      26 2021-08-02      Utilities                         15
# ... with 255 more rows

i want to modify it so that each anon_ID has four observations, one for each unique value of nth_assistance_interaction. the values of assistance_date and Benefit_1 should be NA when real values for these variables don't exist.

e.g., for anon_ID = 11, these two variables would have NA values when nth_assistance_interaction = 20.

# A tibble: 265 x 4
   anon_ID assistance_date Benefit_1 nth_assistance_interaction
     <int> <chr>           <chr>                          <dbl>
 1       8 2020-04-10      Medical                            5
 2       8 2020-04-13      Medical                           10
 3       8 2020-04-15      Medical                           15
 4       8 2020-04-21      Medical                           20
 5      11 2020-06-17      Housing                            5
 6      11 2020-06-25      Financial                         10
 7      11 2021-01-27      Financial                         15
 8      11 NA              NA                                20
 9      26 2020-05-18      Legal                              5
10      26 2021-06-01      Food                              10
11      26 2021-08-02      Utilities                         15
# ... with 255 more rows

this is just one example of what i'm trying to accomplish. it could also be the case that anon_ID = 27 only has one observation for nth_assistance_interaction, and so i would need to add three rows for them.

how can i go about making this happen? thanks in advance.

CodePudding user response:

We may group by 'anon_ID' and use complete to expand the data

library(dplyr)
library(tidyr)
df1 %> 
  group_by(anon_ID) %>% 
  complete(nth_assistance_interaction = c(5, 10, 15, 20)) %>% 
  ungroup
  • Related