Home > Software engineering >  How to obtain the average of two different row ranges in a specific column?
How to obtain the average of two different row ranges in a specific column?

Time:01-18

I have the following sample dataframe. The first column is the month and the second column is the number of surveys conducted each month.

month = c(1,2,3,4,5,6,7,8,9,10,11,12)
surveys = c(4,5,3,7,3,4,4,4,6,1,1,7)

df = data.frame(month, surveys)

I want to calculate the average number of surveys from May - August, and then, the average number of surveys for the remaining months (Jan - April PLUS September - December).

How do I do this using the dplyr package?

CodePudding user response:

Assuming the integers represent months, in dplyr, you could use group_by with a boolean TRUE/FALSE and find the mean with summarize:

df %>% group_by(MayAug = month %in% 5:8) %>% summarize(mean = mean(surveys))

#  MayAug            mean
#  <lgl>            <dbl>
#1 FALSE             4.25
#2 TRUE              3.75

CodePudding user response:

I first create a new factor variable period with labels, then group_by period and summarise using mean

df %>% 
  mutate(period = factor(between(month, 5,8), labels = c("Other months", "May-Aug"))) %>% 
  group_by(period) %>% 
  summarise(mean_surveys = mean(surveys))

 # A tibble: 2 × 2
  period       mean_surveys
  <fct>               <dbl>
1 Other months         4.25
2 May-Aug              3.75
  • Related