How to add rows so that each group has equal number of rows?-CodePudding

I have a data frame with unequal numbers of rows per group, see df in the example below. I would like to add rows containing the group name and NAs in all other columns so that there is an equal number of rows per group like in df.desired. The rows should be added after the last row from the respective group.

Example:

df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
> df
  group col1 col2
1     A    1   12
2     A    1   13
3     A    1   14
4     A    1   15
5     B    2   21
6     B    2   22
7     B    2   23
8     C    3   31
9     C    3   32
df.desired = data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, NA, 3, 3, NA, NA),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, NA, 31, 32, NA, NA))
> df.desired
   group col1 col2
1      A    1   12
2      A    1   13
3      A    1   14
4      A    1   15
5      B    2   21
6      B    2   22
7      B    2   23
8      B   NA   NA
9      C    3   31
10     C    3   32
11     C   NA   NA
12     C   NA   NA

I know how to do this with a loop but that would be super slow and I would prefer to use dplyr if possible. Does anyone have any ideas?

CodePudding user response：

How about this:

library(dplyr)
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
               col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
               col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
maxgp <- max(table(df$group))

df %>% 
  group_by(group) %>% 
  summarise(across(everything(), ~c(.x, rep(NA, maxgp-n()))))
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.
#> # A tibble: 12 × 3
#> # Groups:   group [3]
#>    group  col1  col2
#>    <chr> <dbl> <dbl>
#>  1 A         1    12
#>  2 A         1    13
#>  3 A         1    14
#>  4 A         1    15
#>  5 B         2    21
#>  6 B         2    22
#>  7 B         2    23
#>  8 B        NA    NA
#>  9 C         3    31
#> 10 C         3    32
#> 11 C        NA    NA
#> 12 C        NA    NA

^{Created on 2023-02-01 by the reprex package (v2.0.1)}

CodePudding user response：

You can create row numbers for each group and then tidyr::complete:

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(id = row_number()) %>%
  ungroup() %>% 
  tidyr::complete(group, id) %>%
  select(-id)

# # A tibble: 12 × 3
#    group  col1  col2
#    <chr> <dbl> <dbl>
#  1 A         1    12
#  2 A         1    13
#  3 A         1    14
#  4 A         1    15
#  5 B         2    21
#  6 B         2    22
#  7 B         2    23
#  8 B        NA    NA
#  9 C         3    31
# 10 C         3    32
# 11 C        NA    NA
# 12 C        NA    NA

Update (from @Maël's answer)

After dplyr 1.1.0, Per-operation grouping with .by/by is supported for mutate(), summarise(), filter(), and the slice() family. The code can be simplified to

df %>%
  mutate(id = row_number(), .by = group) %>%
  tidyr::complete(group, id) %>%
  select(-id)