I have a dataframe like this.
df <- data.frame(flight_no = c(515,4370,3730,4687,1124, 4137), carrier = c('B6','EV','MQ','EV','B6','EV'), dep_delay = c(10, 95, -7, 4, 6, 33), is_delayed = c('yes', 'yes', 'no', 'yes', 'yes','yes'), distance = c(1065,1628,719,569,2565,746))
#> flight_no carrier dep_delay is_delayed distance
#> 1 515 B6 10 'yes' 1065
#> 2 4370 EV 95 'yes' 1628
#> 3 3730 MQ -7 'no' 719
#> 4 4687 EV 4 'yes' 569
#> 5 1124 B6 6 'yes' 2565
#> 6 4137 EV 33 'yes' 746
....
I want to show 'the average departure delay for the carriers that made at least 2 flights. So, there are 2 flights by B6 and 3 flights by EV, which match this condition. the answer should be 8 for B6 and 44 for EV.
The code below gives me a table with the number of flights grouped by the carriers B6 and EV, and I need some help how to go from there (new to R).
avg_delay_2flights <-df %>%
filter(is_delayed =='yes') %>%
group_by(carrier) %>%
summarise(num_of_flights = n())%>%
filter(num_of_flights>=2)
View(avg_delay_2flights)
CodePudding user response:
Is this what you want?
avg_delay_2flights <- df %>%
filter(is_delayed =='yes') %>%
group_by(carrier) %>%
summarise(num_of_flights = n(), avg_dep_delay=mean(dep_delay))%>%
filter(num_of_flights>=2)
View(avg_delay_2flights)
#> carrier num_of_flights avg_dep_delay
#> <chr> <int> <dbl>
#> 1 B6 2 8
#> 2 EV 3 44