Home > Net >  dplyr conditional mean of subset of group
dplyr conditional mean of subset of group

Time:09-29

My R dataframe looks like

df <- data.frame(
  Date1 = rep(seq.Date(from = as.Date("2022-08-01"), to = as.Date("2022-09-12"), by = 7), 2), 
  Date2 = as.Date(c(rep("2022-08-15", times = 7), rep("2022-08-29", times = 7))), 
  Value = 1:14)
Date1 Date2 Value
2022-08-01 2022-08-15 1
2022-08-08 2022-08-15 2
...
2022-08-15 2022-08-29 10
2022-08-22 2022-08-29 11
...

Using a magrittr dplyr chain, I want to add a new column that groups the data by Date2 (and any other grouping columns) and shows the mean Value for that group when Date1 is less than Date2 and greater than "Date2 - 2 weeks."

Date1 Date2 Value CondGrpMean
2022-08-01 2022-08-15 1 1.5
2022-08-08 2022-08-15 2 1.5
...
2022-08-15 2022-08-29 10 10.5
2022-08-22 2022-08-29 11 10.5
...

I'm looking at dplyr::across(), dplyr::group_modify(), but can't figure this out in anything like easy syntax that does not require a join.

I'm open to other straightforward solutions, but my current ideal answer would look something like

df %>%
  dplyr::group_by(Date2) %>%
  dplyr::mutate(CondGrpMean = mean(Value %>% 
    dplyr::filter(Date1 < Date2 & 
      Date1 > Date2 - 14)))

The output values for CondGrpMean should be 1.5 for rows 1 - 7 and 10.5 for rows 8 - 14.

CodePudding user response:

1) There is no filter method for numeric vectors. Subscript Value as shown instead:

library(dplyr)

df %>%
  group_by(Date2) %>%
  mutate(CondGrpMean = mean(Value[Date1 >= Date2-14 & Date1 < Date2])) %>%
  ungroup

giving:

# A tibble: 14 × 4
   Date1      Date2      Value CondGrpMean
   <date>     <date>     <int>       <dbl>
 1 2022-08-01 2022-08-15     1         1.5
 2 2022-08-08 2022-08-15     2         1.5
 3 2022-08-15 2022-08-15     3         1.5
 4 2022-08-22 2022-08-15     4         1.5
 5 2022-08-29 2022-08-15     5         1.5
 6 2022-09-05 2022-08-15     6         1.5
 7 2022-09-12 2022-08-15     7         1.5
 8 2022-08-01 2022-08-29     8        10.5
 9 2022-08-08 2022-08-29     9        10.5
10 2022-08-15 2022-08-29    10        10.5
11 2022-08-22 2022-08-29    11        10.5
12 2022-08-29 2022-08-29    12        10.5
13 2022-09-05 2022-08-29    13        10.5
14 2022-09-12 2022-08-29    14        10.5

1a) A variation of this is:

df %>%
  group_by(Date2) %>%
  mutate(CondGrpMean = mean(Value[c(Date2 - Date1) %in% 1:14])) %>%
  ungroup

2) With base R:

Mean <- function(ix) with(df[ix, ], mean(Value[Date1 >= Date2-14 & Date1 < Date2]))
transform(df, CondGrpMean = ave(1:nrow(df), Date2, FUN = Mean))

CodePudding user response:

Add your condition as a column and use it to subset the Value column that you take the mean of:

library(lubridate)
df %>%
  group_by(Date2) %>%
  mutate(
    cond =  Date1 < Date2 & Date1 >= (Date2 - weeks(2)),
    CondGrpMean = mean(Value[cond])
)
# # A tibble: 14 × 5
# # Groups:   Date2 [2]
#    Date1      Date2      Value cond  CondGrpMean
#    <date>     <date>     <int> <lgl>       <dbl>
#  1 2022-08-01 2022-08-15     1 TRUE          1.5
#  2 2022-08-08 2022-08-15     2 TRUE          1.5
#  3 2022-08-15 2022-08-15     3 FALSE         1.5
#  4 2022-08-22 2022-08-15     4 FALSE         1.5
#  5 2022-08-29 2022-08-15     5 FALSE         1.5
#  6 2022-09-05 2022-08-15     6 FALSE         1.5
#  7 2022-09-12 2022-08-15     7 FALSE         1.5
#  8 2022-08-01 2022-08-29     8 FALSE        10.5
#  9 2022-08-08 2022-08-29     9 FALSE        10.5
# 10 2022-08-15 2022-08-29    10 TRUE         10.5
# 11 2022-08-22 2022-08-29    11 TRUE         10.5
# 12 2022-08-29 2022-08-29    12 FALSE        10.5
# 13 2022-09-05 2022-08-29    13 FALSE        10.5
# 14 2022-09-12 2022-08-29    14 FALSE        10.5

CodePudding user response:

I think you are looking for something like this. You can deal with your conditions using ifelse and only include the Value you want by assigning others NA. Then use na.rm in your mean to get the correct values:

library(dplyr)
df <- data.frame(
  Date1 = rep(seq.Date(from = as.Date("2022-08-01"), to = as.Date("2022-09-12"), by = 7), 2), 
  Date2 = as.Date(c(rep("2022-08-15", times = 7), rep("2022-08-29", times = 7))), 
  Value = 1:14)

df %>%
  group_by(Date2) %>%
  mutate(CondGrpMean = mean(ifelse(Date1 < Date2 & Date1 >= Date2 - 14, 
                                   Value, 
                                   NA), na.rm = T
                            )
         )
#> # A tibble: 14 × 4
#> # Groups:   Date2 [2]
#>    Date1      Date2      Value CondGrpMean
#>    <date>     <date>     <int>       <dbl>
#>  1 2022-08-01 2022-08-15     1         1.5
#>  2 2022-08-08 2022-08-15     2         1.5
#>  3 2022-08-15 2022-08-15     3         1.5
#>  4 2022-08-22 2022-08-15     4         1.5
#>  5 2022-08-29 2022-08-15     5         1.5
#>  6 2022-09-05 2022-08-15     6         1.5
#>  7 2022-09-12 2022-08-15     7         1.5
#>  8 2022-08-01 2022-08-29     8        10.5
#>  9 2022-08-08 2022-08-29     9        10.5
#> 10 2022-08-15 2022-08-29    10        10.5
#> 11 2022-08-22 2022-08-29    11        10.5
#> 12 2022-08-29 2022-08-29    12        10.5
#> 13 2022-09-05 2022-08-29    13        10.5
#> 14 2022-09-12 2022-08-29    14        10.5
  • Related