I want to do a group_by summarise
operation on only two columns with one group attribute while keeping the other three columns unchanged which have the same number for every row. How can I do that? e.g.
> data<- data.frame(a=1:10, b=rep(1,10), c=rep(2,10), d=rep(3,10), e= c("small", "med", "larg", "larg", "larg", "med", "small", "small", "small", "med"))
> data %>% group_by(e) %>% summarise(a=mean(a))
# A tibble: 3 × 2
e a
<chr> <dbl>
1 larg 4
2 med 6
3 small 6.25
but I want
# A tibble: 3 × 5
e a b c d
<chr> <dbl> <dbl> <dbl> <dbl>
1 larg 4 1 2 3
2 med 6 1 2 3
3 small 6.25 1 2 3
group_by summarise
always drops other columns. How can I do that?
CodePudding user response:
Add the other columns to group_by
:
> library(tidyverse)
> data <- data.frame(a=1:10, b=rep(1,10), c=rep(2,10), d=rep(3,10), e= c("small", "med", "larg", "larg", "larg", "med", "small", "small", "small", "med"))
> data %>% group_by(e, b, c, d) %>% summarise(a=mean(a))
`summarise()` has grouped output by 'e', 'b', 'c'. You can override using the `.groups` argument.
# A tibble: 3 x 5
# Groups: e, b, c [3]
e b c d a
<chr> <dbl> <dbl> <dbl> <dbl>
1 larg 1 2 3 4
2 med 1 2 3 6
3 small 1 2 3 6.25
CodePudding user response:
And you can always calculate a new variable with group summarise
and keep the rest of your dataframe "intact" adding across()
in the summarise. This could be useful if your other variables arent going to be the same always.
data %>% group_by(e) %>%
summarise(a=mean(a), across())
# A tibble: 10 x 5
# Groups: e [3]
e a b c d
<chr> <dbl> <dbl> <dbl> <dbl>
1 larg 4 1 2 3
2 larg 4 1 2 3
3 larg 4 1 2 3
4 med 6 1 2 3
5 med 6 1 2 3
6 med 6 1 2 3
7 small 6.25 1 2 3
8 small 6.25 1 2 3
9 small 6.25 1 2 3
10 small 6.25 1 2 3