Can't add rows to grouped data frames-CodePudding

This is a follow-up question of this How to add a row to a dataframe modifying only some columns.

After solving this question I wanted to apply the solution provided by stefan to a larger dataframe with group_by:

My dataframe:

df <- structure(list(test_id = c(1, 1, 1, 1, 1, 1, 1, 1), test_nr = c(1, 
1, 1, 1, 2, 2, 2, 2), region = c("A", "B", "C", "D", "A", "B", 
"C", "D"), test_value = c(3, 1, 1, 2, 4, 2, 4, 1)), class = "data.frame", row.names = c(NA, 
-8L))

  test_id test_nr region test_value
1       1       1      A          3
2       1       1      B          1
3       1       1      C          1
4       1       1      D          2
5       1       2      A          4
6       1       2      B          2
7       1       2      C          4
8       1       2      D          1

I now want to add a new row to each group with this code, which gives an error:

df %>%
  group_by(test_nr) %>% 
  add_row(test_id = .$test_id[1], test_nr = .$test_nr[1], region = "mean", test_value = mean(.$test_value))

Error: Can't add rows to grouped data frames.
Run `rlang::last_error()` to see where the error occurred.

My expected output would be:

   test_id test_nr region test_value
1        1       1      A       3.00
2        1       1      B       1.00
3        1       1      C       1.00
4        1       1      D       2.00
5        1       1   MEAN       1.75
6        1       2      A       4.00
7        1       2      B       2.00
8        1       2      C       4.00
9        1       2      D       1.00
10       1       2   MEAN       2.75

I have tried so far:

library(tidyverse)

df %>%
  group_by(test_nr) %>% 
  group_split() %>% 
  map_dfr(~ .x %>% 
            add_row(!!! map(.[4], mean)))

   test_id test_nr region test_value
     <dbl>   <dbl> <chr>       <dbl>
 1       1       1 A            3   
 2       1       1 B            1   
 3       1       1 C            1   
 4       1       1 D            2   
 5      NA      NA NA           1.75
 6       1       2 A            4   
 7       1       2 B            2   
 8       1       2 C            4   
 9       1       2 D            1   
10      NA      NA NA           2.75

How could I modify column 1 to 3 to place my values there?

CodePudding user response：

You can combine your two approaches:

    df %>%
      split(~test_nr) %>%
      map_dfr(~ .x %>% 
                add_row(test_id = .$test_id[1], 
                        test_nr = .$test_nr[1], 
                        region = "mean",
                        test_value = mean(.$test_value)))

CodePudding user response：

I actually recently made a little helper function for exactly this. The idea is to use group_modify() to take the group data, and bind_rows() the summary statistics calculated with summarise().

This is what it looks like in code:

add_summary_rows <- function(.data, ...) {
  group_modify(.data, function(x, y) bind_rows(x, summarise(x, ...)))
}

And here’s how that would work with your data:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  test_id = c(1, 1, 1, 1, 1, 1, 1, 1),
  test_nr = c(1, 1, 1, 1, 2, 2, 2, 2),
  region = c("A", "B", "C", "D", "A", "B", "C", "D"),
  test_value = c(3, 1, 1, 2, 4, 2, 4, 1)
)

df %>% 
  group_by(test_id, test_nr) %>% 
  add_summary_rows(
    region = "MEAN",
    test_value = mean(test_value)
  )
#> # A tibble: 10 x 4
#> # Groups:   test_id, test_nr [2]
#>    test_id test_nr region test_value
#>      <dbl>   <dbl> <chr>       <dbl>
#>  1       1       1 A            3   
#>  2       1       1 B            1   
#>  3       1       1 C            1   
#>  4       1       1 D            2   
#>  5       1       1 MEAN         1.75
#>  6       1       2 A            4   
#>  7       1       2 B            2   
#>  8       1       2 C            4   
#>  9       1       2 D            1   
#> 10       1       2 MEAN         2.75

CodePudding user response：

You could achieve your target with this Base R one-liner:

merge( df, aggregate( df, by = list( df$test_nr ), FUN = mean ), all = TRUE )[ , 1:4 ]

aggregate provides you with the lines you need, and merge inserts them into the right places of your dataframe. You don't need the last column of the combined dataframe, so use only the first four columns. The code produces some warnings for the region column which can be disregarded. In the region column, the function (MEAN) is not displayed.

Making it a little more generic:

f <- "mean"
df1 <- merge( df, aggregate( df, by = list( df$test_id, df$test_nr ),
                             FUN = f ), all = TRUE )[ , 1:4 ]
df1$region[ is.na( df1$region ) ] <- toupper( f )

Here, you aggregate also by test_id, you can change the function you are using in one place, and you have it printed in the region column:

> df1
   test_id test_nr region test_value
1        1       1      A       3.00
2        1       1      B       1.00
3        1       1      C       1.00
4        1       1      D       2.00
5        1       1   MEAN       1.75
6        1       2      A       4.00
7        1       2      B       2.00
8        1       2      C       4.00
9        1       2      D       1.00
10       1       2   MEAN       2.75