I have the following sample dataframe. The first column is the month and the second column is the number of surveys conducted each month.
month = c(1,2,3,4,5,6,7,8,9,10,11,12)
surveys = c(4,5,3,7,3,4,4,4,6,1,1,7)
df = data.frame(month, surveys)
I want to calculate the average number of surveys from May - August, and then, the average number of surveys for the remaining months (Jan - April PLUS September - December).
How do I do this using the dplyr
package?
CodePudding user response:
Assuming the integers represent months, in dplyr
, you could use group_by
with a boolean TRUE/FALSE and find the mean with summarize
:
df %>% group_by(MayAug = month %in% 5:8) %>% summarize(mean = mean(surveys))
# MayAug mean
# <lgl> <dbl>
#1 FALSE 4.25
#2 TRUE 3.75
CodePudding user response:
I first create a new factor variable period
with labels, then group_by
period
and summarise
using mean
df %>%
mutate(period = factor(between(month, 5,8), labels = c("Other months", "May-Aug"))) %>%
group_by(period) %>%
summarise(mean_surveys = mean(surveys))
# A tibble: 2 × 2
period mean_surveys
<fct> <dbl>
1 Other months 4.25
2 May-Aug 3.75