I have a dataframe with 4 columns Age
, Location
, Distance
, and Value
. Age
and Location
each have two possible values whereas Distance
can have three. Value
is the observed continuous variable which has been measured 3 times per Distance
.
Accounting for Age
and Location
, I would like to calculate a mean for one of the Distance
values and then calculate another mean Value
when the other two Distance
are combined. I am trying to answer, what is the mean Value
for Distance
0.5 relative to Distance
1.5 & 2.5 for each Age
and Location
?
How can I do this using dplyr?
Example Data
library(dyplyr)
set.seed(123)
df1 <- data.frame(matrix(ncol = 4, nrow = 36))
x <- c("Age","Location","Distance","Value")
colnames(df1) <- x
df1$Age <- rep(c(1,2), each = 18)
df1$Location <- as.character(rep(c("Central","North"), each = 9))
df1$Distance <- rep(c(0.5,1.5,2.5), each = 3)
df1$Value <- round(rnorm(36,200,25),0)
Output should look something like this
Age Location Mean_0.5 Mean_1.5_and_2.5
1 1 Central 206 202
2 1 North 210 201
3 2 Central 193 186
4 2 North 202 214
CodePudding user response:
We may use %in%
or ==
to subset the 'Value' based on the 'Distance' values (assuming the precision is correct) after grouping by 'Age', 'Location'
library(dplyr)
df1 %>%
group_by(Age, Location) %>%
summarise(Mean_0.5 = mean(Value[Distance == 0.5]),
Mean_1.5_and_2.5 = mean(Value[Distance %in% c(1.5, 2.5)]),
.groups = 'drop')
-output
# A tibble: 4 × 4
Age Location Mean_0.5 Mean_1.5_and_2.5
<dbl> <chr> <dbl> <dbl>
1 1 Central 206. 202.
2 1 North 210. 201.
3 2 Central 193 186.
4 2 North 202. 214.