I want to group by keeping the continuous columns as rows and the categorical factors as the column headers with the aggregated record being the mean or min or max. This is a fundamental question, the answer to which I am not being able to figure out. Take the iris data as an example. I want to get the mean of sepal.width and sepal.length with respect to every species category.
library(dplyr)
mydata2 <-iris
# Groupby function for dataframe in R
summarise_at(group_by(mydata2,Species),vars(Sepal.Length),funs(mean(.,na.rm=TRUE)))
OUTPUT
Species Sepal.Length
<fct> <dbl>
1 setosa 5.01
2 versicolor 5.94
3 virginica 6.59
I want to get the same output with Sepal.Length as my rows instead of Species and the various factors of Species as my columns. I also want Sepal.Width, Petal.Length, Petal.Width as well How will I do that?
This is what I am looking for -
Species setosa versicolor virginica
1 Sepal.Length 5.01 5.94 6.59
Below this there should be Sepal.Width and other continuous columns as well. I have tried transposing but that is changing everything to character data type.
CodePudding user response:
One option to achieve your desired result would be to reshape your data after summarise
via e.g. pivot_longer
and pivot_wider
. If you do that often you could put the code into a convenience function to do that in one step:
Note: I also dropped the summarise_at
and switched to the new API using across
and where
.
library(dplyr)
library(tidyr)
summarise(group_by(iris, Species), across(where(is.numeric), mean, na.rm=TRUE)) %>%
pivot_longer(-Species, names_to = "var") %>%
pivot_wider(names_from = Species, values_from = value)
#> # A tibble: 4 × 4
#> var setosa versicolor virginica
#> <chr> <dbl> <dbl> <dbl>
#> 1 Sepal.Length 5.01 5.94 6.59
#> 2 Sepal.Width 3.43 2.77 2.97
#> 3 Petal.Length 1.46 4.26 5.55
#> 4 Petal.Width 0.246 1.33 2.03
CodePudding user response:
You can use tapply
insinde lapply
:
do.call(rbind, lapply(iris[sapply(iris, is.numeric)],
function(x) tapply(x, iris$Species, mean)))
# setosa versicolor virginica
#Sepal.Length 5.006 5.936 6.588
#Sepal.Width 3.428 2.770 2.974
#Petal.Length 1.462 4.260 5.552
#Petal.Width 0.246 1.326 2.026
CodePudding user response:
iris %>%
group_by(Species) %>%
summarize(across(everything(), mean)) %>%
t()
[,1] [,2] [,3]
Species "setosa" "versicolor" "virginica"
Sepal.Length "5.006" "5.936" "6.588"
Sepal.Width "3.428" "2.770" "2.974"
Petal.Length "1.462" "4.260" "5.552"
Petal.Width "0.246" "1.326" "2.026"