Home > Software engineering >  Summarizing over groups within larger groups using dplyr
Summarizing over groups within larger groups using dplyr

Time:08-17

I am working with a dataset with repeated observations within each treatment, and I need to find the mean value within each of the treatments. I would like to use dplyr. My data is as follows:

ex <- data.frame(plot = c(101,102,103,104,105,106,201,202,203,204,205,206,301,302,303,304,305,306),
                 trt = c("a","a","b","b","c","c","a","a","b","b","c","c","a","a","b","b","c","c"),
                 value = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6))

The summarised dataset needs to be the mean of each treatment within the sub-groups. Here is what it should look like when completed:

correct <- data.frame(trt = c("a","b","c","a","b","c","a","b","c"),
                       rating = c(1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5))

Here is what I have tried:

library(dplyr)

example <- ex %>% 
  dplyr::select(plot, trt, value) %>% 
  group_by(trt) %>% 
  summarise(rating = mean(value), .groups = 'drop')

However, the following is produced:

incorrect_example <- data.frame(trt=c("a","b","c"),
                                rating=c(1.5,3.5,5.5))

How can I produce results like those indicated in the correct dataframe?

CodePudding user response:

We may need rleid

library(dplyr)
library(data.table)
ex %>%  
  group_by(grp = rleid(trt), trt) %>% 
  summarise(rating = mean(value), .groups = 'drop') %>%
  select(-grp)

-output

# A tibble: 9 × 2
  trt   rating
  <chr>  <dbl>
1 a        1.5
2 b        3.5
3 c        5.5
4 a        1.5
5 b        3.5
6 c        5.5
7 a        1.5
8 b        3.5
9 c        5.5

Or may also use %/% on the 'plot' to create a grouping column along with 'trt' as group, then get the mean of the 'value' column

ex %>% 
  group_by(grp = plot %/% 100, trt) %>% 
  summarise(value = mean(value), .groups = 'drop') %>% 
  select(-grp)

-output

# A tibble: 9 × 2
  trt   value
  <chr> <dbl>
1 a       1.5
2 b       3.5
3 c       5.5
4 a       1.5
5 b       3.5
6 c       5.5
7 a       1.5
8 b       3.5
9 c       5.5

CodePudding user response:

  • We can use
library(dplyr)
library(stringr)

ex |> group_by(trt , str_extract(plot , "\\d")) |>
      summarise(rating = mean(value)) |> select(trt , rating)
  • Output
# A tibble: 9 × 2
# Groups:   trt [3]
  trt   rating
  <chr>  <dbl>
1 a        1.5
2 a        1.5
3 a        1.5
4 b        3.5
5 b        3.5
6 b        3.5
7 c        5.5
8 c        5.5
9 c        5.5

CodePudding user response:

You might want to use mutate rather than summarise, and then add a filter afterwards (depending on what identifies a group - here the minimum value).

E.g.

ex |>
    select(plot, trt, value) |> 
    group_by(trt) |>
    mutate(rating = mean(value)) |>
    ungroup() |>
    filter(value == min(value)) |>
    select(-plot, -value)

Output:

# A tibble: 9 × 2
  trt   rating
  <chr>  <dbl>
1 a        1.5
2 b        3.5
3 c        5.5
4 a        1.5
5 b        3.5
6 c        5.5
7 a        1.5
8 b        3.5
9 c        5.5
  • Related