Home > Mobile >  Group by range and calculate mean in R and graph
Group by range and calculate mean in R and graph

Time:09-16

I have a table input of data on cars. Roughly 73 observations with 11 variables. I was tasked to find the mean of MPG by gear_ratio. This is the output I receive after completing an aggregate function.

dfdata=data.frame(cars10)
aggregate(x= dfdata$mpg, by=list(dfdata$gear_ratio), FUN=mean)
Gear_ratio Mean_MPG
2.19 14
2.24 21
2.2.26 25
2.28 17
2.2.47 21
2.53 17
2.56 17.5
2.73 21.3
2.75 20.12
2.87 34
3.05 23.6
3.15 17
3.5 23
3.87 18
3.9 30.3

Next I would like to group and graph the following means by a range of Gear_ratio. The ranges need to be a) 2.0-2.5 b) 2.5 to 3.0 c)3.0-3.5 and d)3.5-4.0. I'd like to change colors for each group as well.

*Wasn't sure if there was a way to group by range in the initial aggregate function I created to find the mean in the first place.

CodePudding user response:

This is what I think the OP is asking for. I'm using mtcars data for demonstration and using the wt variable instead of gears as it has values similar to your gear_ratio:

library(dplyr) # for select and case_when and group_by
data = mtcars%>%
  select(wt, mpg)%>%
  mutate(wt_group =  case_when(wt >4 ~ ">4",  # form the groups based on ranges
                              wt >=3.5 ~ "3.5-4", # case_when will read this in order, so no need to do conditional statements such as wt >=3.0 but <3.5
                              wt >=3.0 ~ "3.0-3.5",
                              wt >=2.5 ~ "2.5-3.0",
                              wt >=2.0 ~ "2.0-2.5",
                              wt < 2.0 ~ "<2.0"))%>%
  group_by(wt_group)%>%
  summarise(mean_mpg = mean(mpg) )
> head(data)
# A tibble: 6 x 2
  wt_group mean_mpg
  <chr>       <dbl>
1 <2.0         30.5
2 >4           13.0
3 2.0-2.5      25.7
4 2.5-3.0      20.8
5 3.0-3.5      19.3
6 3.5-4        15.7

Then for the plot :

library(ggplot2)
plot = ggplot(data, aes(x= wt_group, y = mean_mpg , fill = wt_group)) 
  geom_col()
plot

enter image description here

Please edit the question and I can update answer if this doesn't give you what you need.

CodePudding user response:

library(dplyr)

df %>% 
  mutate(gear_group = cut(Gear_ratio,seq(0,4,.5)))

# A tibble: 15 x 3
   Gear_ratio Mean_MPG gear_group
        <dbl>    <dbl> <fct>     
 1       2.19     14   (2,2.5]   
 2       2.24     21   (2,2.5]   
 3       2.26     25   (2,2.5]   
 4       2.28     17   (2,2.5]   
 5       2.47     21   (2,2.5]   
 6       2.53     17   (2.5,3]   
 7       2.56     17.5 (2.5,3]   
 8       2.73     21.3 (2.5,3]   
 9       2.75     20.1 (2.5,3]   
10       2.87     34   (2.5,3]   
11       3.05     23.6 (3,3.5]   
12       3.15     17   (3,3.5]   
13       3.5      23   (3,3.5]   
14       3.87     18   (3.5,4]   
15       3.9      30.3 (3.5,4]  
  •  Tags:  
  • r
  • Related