Home > other >  R - Group by on continuous variable headers with categorical variable factors as rows and aggregated
R - Group by on continuous variable headers with categorical variable factors as rows and aggregated

Time:11-22

I want to group by keeping the continuous columns as rows and the categorical factors as the column headers with the aggregated record being the mean or min or max. This is a fundamental question, the answer to which I am not being able to figure out. Take the iris data as an example. I want to get the mean of sepal.width and sepal.length with respect to every species category.

library(dplyr)

mydata2 <-iris 

# Groupby function for dataframe in R

summarise_at(group_by(mydata2,Species),vars(Sepal.Length),funs(mean(.,na.rm=TRUE)))

OUTPUT 

Species    Sepal.Length
  <fct>             <dbl>
1 setosa             5.01
2 versicolor         5.94
3 virginica          6.59

I want to get the same output with Sepal.Length as my rows instead of Species and the various factors of Species as my columns. I also want Sepal.Width, Petal.Length, Petal.Width as well How will I do that?

This is what I am looking for -

Species            setosa     versicolor  virginica
  
1 Sepal.Length       5.01      5.94       6.59

Below this there should be Sepal.Width and other continuous columns as well. I have tried transposing but that is changing everything to character data type.

CodePudding user response:

One option to achieve your desired result would be to reshape your data after summarise via e.g. pivot_longer and pivot_wider. If you do that often you could put the code into a convenience function to do that in one step:

Note: I also dropped the summarise_at and switched to the new API using across and where.

library(dplyr)
library(tidyr)

summarise(group_by(iris, Species), across(where(is.numeric), mean, na.rm=TRUE)) %>% 
  pivot_longer(-Species, names_to = "var") %>% 
  pivot_wider(names_from = Species, values_from = value)
#> # A tibble: 4 × 4
#>   var          setosa versicolor virginica
#>   <chr>         <dbl>      <dbl>     <dbl>
#> 1 Sepal.Length  5.01        5.94      6.59
#> 2 Sepal.Width   3.43        2.77      2.97
#> 3 Petal.Length  1.46        4.26      5.55
#> 4 Petal.Width   0.246       1.33      2.03

CodePudding user response:

You can use tapply insinde lapply:

do.call(rbind, lapply(iris[sapply(iris, is.numeric)],
                      function(x) tapply(x, iris$Species, mean)))
#             setosa versicolor virginica
#Sepal.Length  5.006      5.936     6.588
#Sepal.Width   3.428      2.770     2.974
#Petal.Length  1.462      4.260     5.552
#Petal.Width   0.246      1.326     2.026

CodePudding user response:

iris %>%
  group_by(Species) %>%
  summarize(across(everything(), mean)) %>%
  t()

             [,1]     [,2]         [,3]       
Species      "setosa" "versicolor" "virginica"
Sepal.Length "5.006"  "5.936"      "6.588"    
Sepal.Width  "3.428"  "2.770"      "2.974"    
Petal.Length "1.462"  "4.260"      "5.552"    
Petal.Width  "0.246"  "1.326"      "2.026"    
  • Related