I have a dataframe similar to this:
df <- data.frame(flight_no = c(515,4370,3730,4687,1124), dep_delay = c(-10, 95, -7, 4, 6), is_delayed = c('no', 'yes', 'no', 'yes', 'yes'), distance = c(1065,628,719,569,2565))
#> flight_no dep_delay is_delayed distance
#> 1 515 -10 'no' 1065
#> 2 4370 95 'yes' 628
#> 3 3730 -7 'no' 719
#> 4 4687 4 'yes' 569
#> 4 1124 6 'yes' 2565
I need to find the average (mean) delay for flights going over 1000 miles, and the average (mean) delay for flights going less than 1000 miles filtering for the delayed flights only.
I have tried this
df %>%
filter(is_delayed =='yes') %>% # Find delayed flights
group_by(distance >1000) %>% # Group by distance over 1000 miles
summarise(avg = mean(dep_delay), # Summarise and find the mean delay
count = n())
Output:
A tibble: 2 × 3
`distance > 1000` avg count
<lgl> <dbl> <int>
1 FALSE 49.5 2
2 TRUE 6 1
It seems correct. is there actually a way to change FALSE and TRUE to 'distance less than 1000' and 'distance more than 1000', respectively? Maybe there is a better way to to do this. I'm new to R.
CodePudding user response:
You may conveniently use aggregate
for that.
aggregate(dep_delay ~ distance > 1000, df, subset=is_delayed == 'yes',
\(x) c(mean=mean(x), n=length(x)))
# distance > 1000 dep_delay.mean dep_delay.n
# 1 FALSE 49.5 2.0
# 2 TRUE 6.0 1.0
CodePudding user response:
You can use ifelse
to change the levels, and round
to round the values.
df %>%
filter(is_delayed == "yes") %>%
group_by(distance_1000 = ifelse(distance > 1000, "distance more than 1000", "distance less or equal to 1000")) %>%
summarise(avg = round(mean(dep_delay), 2),
count = n())
# distance_1000 avg count
# 1 distance less or equal to 1000 49.5 2
# 2 distance more than 1000 6.0 1