Home > front end >  How to calculate mean by row for multiple groups using dplyr in R?
How to calculate mean by row for multiple groups using dplyr in R?

Time:09-28

I have a dataframe with 4 columns Age, Location, Distance, and Value. Age and Location each have two possible values whereas Distance can have three. Value is the observed continuous variable which has been measured 3 times per Distance.

Accounting for Age and Location, I would like to calculate a mean for one of the Distance values and then calculate another mean Value when the other two Distance are combined. I am trying to answer, what is the mean Value for Distance 0.5 relative to Distance 1.5 & 2.5 for each Age and Location?

How can I do this using dplyr?

Example Data

library(dyplyr)

set.seed(123)
df1 <- data.frame(matrix(ncol = 4, nrow = 36))
x <- c("Age","Location","Distance","Value")
colnames(df1) <- x
df1$Age <- rep(c(1,2), each = 18)
df1$Location <- as.character(rep(c("Central","North"), each = 9))
df1$Distance <- rep(c(0.5,1.5,2.5), each = 3)
df1$Value <- round(rnorm(36,200,25),0)

Output should look something like this

  Age Location Mean_0.5 Mean_1.5_and_2.5
1   1  Central      206              202
2   1    North      210              201
3   2  Central      193              186
4   2    North      202              214

CodePudding user response:

We may use %in% or == to subset the 'Value' based on the 'Distance' values (assuming the precision is correct) after grouping by 'Age', 'Location'

library(dplyr)
df1 %>%
     group_by(Age, Location) %>% 
     summarise(Mean_0.5 = mean(Value[Distance == 0.5]), 
        Mean_1.5_and_2.5 = mean(Value[Distance %in% c(1.5, 2.5)]),
        .groups = 'drop')

-output

# A tibble: 4 × 4
    Age Location Mean_0.5 Mean_1.5_and_2.5
  <dbl> <chr>       <dbl>            <dbl>
1     1 Central      206.             202.
2     1 North        210.             201.
3     2 Central      193              186.
4     2 North        202.             214.
  • Related