I have been puzzling over getting the mean of a response under conditions quite some time now and I would appreciate any help of a clear mind at the moment.
Trial <- c("1", "1", "2", "2", "3", "3", "4", "4","5", "5", "6", "6", "7", "7", "8", "8", "9", "9", "10", "10")
Session <- c("2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6")
Type <- c("x", "x", "x", "x", "y", "y", "x", "x", "y", "y", "y", "y", "x", "x", "y", "y", "y", "y", "x", "x")
Response <- c("3", "2", "2", "4", "2", "4", "6", "1", "3", "4", "2", "5", "1", "6", "5", "4", "6", "1", "3", "4")
df <- data.frame(Trial, Session, Type, Response)
I have a bunch of responses for several sessions. How can I get the mean of the "Response" for Session 2 of Type x but only if the previous "Response" is of Session 6 AND Type y?
Expected output is just the mean response (numeric).
Thank you for your time. If additional information is needed let me know.
CodePudding user response:
You can use dplyr::lag
to get the lagged vectors for your conditional statements:
mean(df$Response[which(df$Session == 2 &
df$Type == "x" &
dplyr::lag(df$Session) == 6 &
dplyr::lag(df$Type) == "y")])
#> [1] 3.333333
Created on 2022-04-03 by the reprex package (v2.0.1)
Data in reproducible format
df <- data.frame(Trial = rep(1:10, each = 2),
Session = rep(c(2, 6), 10),
Type = rep(rep(c("x", "y"), len = 7),
times = c(4, 2, 2, 4, 2, 4, 2)),
Response = c(2, 4:6, 3, 2, 3, 3, 4, 2, 3, 4, 5, 2, 2, 3, 3,
4, 2, 3))
df
#> Trial Session Type Response
#> 1 1 2 x 2
#> 2 1 6 x 4
#> 3 2 2 x 5
#> 4 2 6 x 6
#> 5 3 2 y 3
#> 6 3 6 y 2
#> 7 4 2 x 3
#> 8 4 6 x 3
#> 9 5 2 y 4
#> 10 5 6 y 2
#> 11 6 2 y 3
#> 12 6 6 y 4
#> 13 7 2 x 5
#> 14 7 6 x 2
#> 15 8 2 y 2
#> 16 8 6 y 3
#> 17 9 2 y 3
#> 18 9 6 y 4
#> 19 10 2 x 2
#> 20 10 6 x 3
CodePudding user response:
Just for fun here is an other approach: The conditions are the same:
Interstingly if we replace
mutate(mean = ifelse(x == TRUE, sum(Response[x==TRUE])/ nrow(df[x==TRUE, ]), NA))
by
mutate(mean = ifelse(x == TRUE, mean(Response), NA))
we will get mean = 3.25
library(dplyr)
df %>%
mutate(x = case_when(
Session == 2 &
Type == "x" &
lag(Session) == 6 &
lag(Type) == "y" ~ TRUE,
TRUE ~ FALSE
)) %>%
mutate(mean = ifelse(x == TRUE, sum(Response[x==TRUE])/
nrow(df[x==TRUE, ]), NA)) %>%
filter (., is.na(mean)==FALSE) %>%
distinct(mean)
mean
1 3.333333