I have a data frame with unequal numbers of rows per group, see df
in the example below. I would like to add rows containing the group name and NAs in all other columns so that there is an equal number of rows per group like in df.desired
. The rows should be added after the last row from the respective group.
Example:
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
> df
group col1 col2
1 A 1 12
2 A 1 13
3 A 1 14
4 A 1 15
5 B 2 21
6 B 2 22
7 B 2 23
8 C 3 31
9 C 3 32
df.desired = data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, NA, 3, 3, NA, NA),
col2 = c(12, 13, 14, 15, 21, 22, 23, NA, 31, 32, NA, NA))
> df.desired
group col1 col2
1 A 1 12
2 A 1 13
3 A 1 14
4 A 1 15
5 B 2 21
6 B 2 22
7 B 2 23
8 B NA NA
9 C 3 31
10 C 3 32
11 C NA NA
12 C NA NA
I know how to do this with a loop but that would be super slow and I would prefer to use dplyr
if possible. Does anyone have any ideas?
CodePudding user response:
How about this:
library(dplyr)
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
maxgp <- max(table(df$group))
df %>%
group_by(group) %>%
summarise(across(everything(), ~c(.x, rep(NA, maxgp-n()))))
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.
#> # A tibble: 12 × 3
#> # Groups: group [3]
#> group col1 col2
#> <chr> <dbl> <dbl>
#> 1 A 1 12
#> 2 A 1 13
#> 3 A 1 14
#> 4 A 1 15
#> 5 B 2 21
#> 6 B 2 22
#> 7 B 2 23
#> 8 B NA NA
#> 9 C 3 31
#> 10 C 3 32
#> 11 C NA NA
#> 12 C NA NA
Created on 2023-02-01 by the reprex package (v2.0.1)
CodePudding user response:
You can create row numbers for each group and then tidyr::complete
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(id = row_number()) %>%
ungroup() %>%
tidyr::complete(group, id) %>%
select(-id)
# # A tibble: 12 × 3
# group col1 col2
# <chr> <dbl> <dbl>
# 1 A 1 12
# 2 A 1 13
# 3 A 1 14
# 4 A 1 15
# 5 B 2 21
# 6 B 2 22
# 7 B 2 23
# 8 B NA NA
# 9 C 3 31
# 10 C 3 32
# 11 C NA NA
# 12 C NA NA
Update (from @Maël's answer)
After dplyr 1.1.0
, Per-operation grouping with .by/by is supported for mutate()
, summarise()
, filter()
, and the slice()
family. The code can be simplified to
df %>%
mutate(id = row_number(), .by = group) %>%
tidyr::complete(group, id) %>%
select(-id)