I am working with a dataset with repeated observations within each treatment, and I need to find the mean value within each of the treatments. I would like to use dplyr
. My data is as follows:
ex <- data.frame(plot = c(101,102,103,104,105,106,201,202,203,204,205,206,301,302,303,304,305,306),
trt = c("a","a","b","b","c","c","a","a","b","b","c","c","a","a","b","b","c","c"),
value = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6))
The summarised dataset needs to be the mean of each treatment within the sub-groups. Here is what it should look like when completed:
correct <- data.frame(trt = c("a","b","c","a","b","c","a","b","c"),
rating = c(1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5))
Here is what I have tried:
library(dplyr)
example <- ex %>%
dplyr::select(plot, trt, value) %>%
group_by(trt) %>%
summarise(rating = mean(value), .groups = 'drop')
However, the following is produced:
incorrect_example <- data.frame(trt=c("a","b","c"),
rating=c(1.5,3.5,5.5))
How can I produce results like those indicated in the correct
dataframe?
CodePudding user response:
We may need rleid
library(dplyr)
library(data.table)
ex %>%
group_by(grp = rleid(trt), trt) %>%
summarise(rating = mean(value), .groups = 'drop') %>%
select(-grp)
-output
# A tibble: 9 × 2
trt rating
<chr> <dbl>
1 a 1.5
2 b 3.5
3 c 5.5
4 a 1.5
5 b 3.5
6 c 5.5
7 a 1.5
8 b 3.5
9 c 5.5
Or may also use %/%
on the 'plot' to create a grouping column along with 'trt' as group, then get the mean
of the 'value' column
ex %>%
group_by(grp = plot %/% 100, trt) %>%
summarise(value = mean(value), .groups = 'drop') %>%
select(-grp)
-output
# A tibble: 9 × 2
trt value
<chr> <dbl>
1 a 1.5
2 b 3.5
3 c 5.5
4 a 1.5
5 b 3.5
6 c 5.5
7 a 1.5
8 b 3.5
9 c 5.5
CodePudding user response:
- We can use
library(dplyr)
library(stringr)
ex |> group_by(trt , str_extract(plot , "\\d")) |>
summarise(rating = mean(value)) |> select(trt , rating)
- Output
# A tibble: 9 × 2
# Groups: trt [3]
trt rating
<chr> <dbl>
1 a 1.5
2 a 1.5
3 a 1.5
4 b 3.5
5 b 3.5
6 b 3.5
7 c 5.5
8 c 5.5
9 c 5.5
CodePudding user response:
You might want to use mutate
rather than summarise
, and then add a filter afterwards (depending on what identifies a group - here the minimum value
).
E.g.
ex |>
select(plot, trt, value) |>
group_by(trt) |>
mutate(rating = mean(value)) |>
ungroup() |>
filter(value == min(value)) |>
select(-plot, -value)
Output:
# A tibble: 9 × 2
trt rating
<chr> <dbl>
1 a 1.5
2 b 3.5
3 c 5.5
4 a 1.5
5 b 3.5
6 c 5.5
7 a 1.5
8 b 3.5
9 c 5.5