I have a dataframe with multiple columns and want to summarise rowwise by taking the mean on columns that start with a specific name. Therefore, this should summarise the columns and only return individual columns for each naming parameter.
For example:
iris %>% aggregate(. ~ Species, data=., sum) %>% group_by(Species) %>% mutate(summarise(across(starts_with(c('Sepal','Petal')), mean), .groups = "rowwise"))
produces:
# A tibble: 3 × 6
# Groups: Species [3]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <dbl> <dbl> <dbl> <dbl> <rowwise_df[,0]>
1 setosa 250. 171. 73.1 12.3
2 versicolor 297. 138. 213 66.3
3 virginica 329. 149. 278. 101.
However, I was expecting a dataframe like the following:
Species Sepal Petal
1 setosa 210.5 41.5
..
..
CodePudding user response:
The code is mixing tidyverse with base R. We may do this directly in tidyverse
i.e. after grouping by 'Species', get the column wise sum with across
, then get the rowMeans
of the numeric columns
library(dplyr)
iris %>%
group_by(Species) %>%
summarise(across(everything(), sum), .groups = 'drop') %>%
transmute(Species, Sepal = rowMeans(across(starts_with("Sepal"))),
Petal = rowMeans(across(starts_with("Petal"))))
-output
# A tibble: 3 × 3
Species Sepal Petal
<fct> <dbl> <dbl>
1 setosa 211. 42.7
2 versicolor 218. 140.
3 virginica 239. 189.
If we want to use rowwise
in groups
(note that rowwise
would be slower compared to vectorized rowMeans
)
iris %>%
group_by(Species) %>%
summarise(across(everything(), sum), .groups = 'rowwise') %>%
transmute(Sepal = mean(c_across(starts_with("Sepal"))),
Petal = mean(c_across(starts_with("Petal")))) %>%
ungroup
-output
# A tibble: 3 × 3
Species Sepal Petal
<fct> <dbl> <dbl>
1 setosa 211. 42.7
2 versicolor 218. 140.
3 virginica 239. 189.