Home > Mobile >  Averaging a column after grouping by another column in R
Averaging a column after grouping by another column in R

Time:10-19

I have a dataframe like this.

df <- data.frame(flight_no = c(515,4370,3730,4687,1124, 4137), carrier = c('B6','EV','MQ','EV','B6','EV'), dep_delay = c(10, 95, -7, 4, 6, 33), is_delayed = c('yes', 'yes', 'no', 'yes', 'yes','yes'), distance = c(1065,1628,719,569,2565,746))
 
#>   flight_no  carrier     dep_delay   is_delayed  distance
#> 1  515        B6            10          'yes'      1065
#> 2  4370       EV            95          'yes'      1628
#> 3  3730       MQ            -7          'no'       719
#> 4  4687       EV            4           'yes'      569
#> 5  1124       B6            6           'yes'      2565
#> 6  4137       EV            33          'yes'      746
....

I want to show 'the average departure delay for the carriers that made at least 2 flights. So, there are 2 flights by B6 and 3 flights by EV, which match this condition. the answer should be 8 for B6 and 44 for EV.

The code below gives me a table with the number of flights grouped by the carriers B6 and EV, and I need some help how to go from there (new to R).

avg_delay_2flights <-df %>%
  filter(is_delayed =='yes') %>% 
  group_by(carrier) %>%                           
  summarise(num_of_flights = n())%>% 
  filter(num_of_flights>=2)

View(avg_delay_2flights)

CodePudding user response:

Is this what you want?

avg_delay_2flights <- df %>%
  filter(is_delayed =='yes') %>% 
  group_by(carrier) %>%                           
  summarise(num_of_flights = n(), avg_dep_delay=mean(dep_delay))%>% 
  filter(num_of_flights>=2)

View(avg_delay_2flights)

#> carrier num_of_flights avg_dep_delay
#>  <chr>            <int>         <dbl>
#> 1 B6                   2             8
#> 2 EV                   3            44
  • Related