Home > Software design >  R: get the mean of responses only if previous response is of a a specific type
R: get the mean of responses only if previous response is of a a specific type

Time:04-04

I have been puzzling over getting the mean of a response under conditions quite some time now and I would appreciate any help of a clear mind at the moment.

    Trial <- c("1", "1", "2", "2", "3", "3", "4", "4","5", "5", "6", "6", "7", "7", "8", "8", "9", "9", "10", "10") 
    Session <- c("2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6", "2", "6") 
    Type <- c("x", "x", "x", "x", "y", "y", "x", "x", "y", "y", "y", "y", "x", "x", "y", "y", "y", "y", "x", "x") 
    Response <- c("3", "2", "2", "4", "2", "4", "6", "1", "3", "4", "2", "5", "1", "6", "5", "4", "6", "1", "3", "4") 
    df <- data.frame(Trial, Session, Type, Response)

I have a bunch of responses for several sessions. How can I get the mean of the "Response" for Session 2 of Type x but only if the previous "Response" is of Session 6 AND Type y?

Expected output is just the mean response (numeric).

Thank you for your time. If additional information is needed let me know.

Here is an example of the dataframe imported in R

CodePudding user response:

You can use dplyr::lag to get the lagged vectors for your conditional statements:

 mean(df$Response[which(df$Session == 2 & 
                        df$Type == "x" & 
                        dplyr::lag(df$Session) == 6 &
                        dplyr::lag(df$Type) == "y")])
#> [1] 3.333333

Created on 2022-04-03 by the reprex package (v2.0.1)


Data in reproducible format

df <- data.frame(Trial = rep(1:10, each = 2),
                 Session = rep(c(2, 6), 10),
                 Type = rep(rep(c("x", "y"), len = 7), 
                            times = c(4, 2, 2, 4, 2, 4, 2)),
                 Response = c(2, 4:6, 3, 2, 3, 3, 4, 2, 3, 4, 5, 2, 2, 3, 3,
                              4, 2, 3))

df
#>    Trial Session Type Response
#> 1      1       2    x        2
#> 2      1       6    x        4
#> 3      2       2    x        5
#> 4      2       6    x        6
#> 5      3       2    y        3
#> 6      3       6    y        2
#> 7      4       2    x        3
#> 8      4       6    x        3
#> 9      5       2    y        4
#> 10     5       6    y        2
#> 11     6       2    y        3
#> 12     6       6    y        4
#> 13     7       2    x        5
#> 14     7       6    x        2
#> 15     8       2    y        2
#> 16     8       6    y        3
#> 17     9       2    y        3
#> 18     9       6    y        4
#> 19    10       2    x        2
#> 20    10       6    x        3

CodePudding user response:

Just for fun here is an other approach: The conditions are the same:

Interstingly if we replace

mutate(mean = ifelse(x == TRUE, sum(Response[x==TRUE])/ nrow(df[x==TRUE, ]), NA))

by

mutate(mean = ifelse(x == TRUE, mean(Response), NA)) we will get mean = 3.25

library(dplyr)
df %>% 
  mutate(x = case_when(
    Session == 2 & 
      Type == "x" & 
      lag(Session) == 6 &
      lag(Type) == "y" ~ TRUE,
    TRUE ~ FALSE
  )) %>% 
  mutate(mean = ifelse(x == TRUE, sum(Response[x==TRUE])/
                                        nrow(df[x==TRUE, ]), NA)) %>% 
  filter (., is.na(mean)==FALSE) %>% 
  distinct(mean)
      mean
1 3.333333
  • Related