My R dataframe looks like
df <- data.frame(
Date1 = rep(seq.Date(from = as.Date("2022-08-01"), to = as.Date("2022-09-12"), by = 7), 2),
Date2 = as.Date(c(rep("2022-08-15", times = 7), rep("2022-08-29", times = 7))),
Value = 1:14)
Date1 | Date2 | Value |
---|---|---|
2022-08-01 | 2022-08-15 | 1 |
2022-08-08 | 2022-08-15 | 2 |
... | ||
2022-08-15 | 2022-08-29 | 10 |
2022-08-22 | 2022-08-29 | 11 |
... |
Using a magrittr
dplyr
chain, I want to add a new column that groups the data by Date2
(and any other grouping columns) and shows the mean Value
for that group when Date1
is less than Date2
and greater than "Date2
- 2 weeks."
Date1 | Date2 | Value | CondGrpMean |
---|---|---|---|
2022-08-01 | 2022-08-15 | 1 | 1.5 |
2022-08-08 | 2022-08-15 | 2 | 1.5 |
... | |||
2022-08-15 | 2022-08-29 | 10 | 10.5 |
2022-08-22 | 2022-08-29 | 11 | 10.5 |
... |
I'm looking at dplyr::across()
, dplyr::group_modify()
, but can't figure this out in anything like easy syntax that does not require a join.
I'm open to other straightforward solutions, but my current ideal answer would look something like
df %>%
dplyr::group_by(Date2) %>%
dplyr::mutate(CondGrpMean = mean(Value %>%
dplyr::filter(Date1 < Date2 &
Date1 > Date2 - 14)))
The output values for CondGrpMean
should be 1.5 for rows 1 - 7 and 10.5 for rows 8 - 14.
CodePudding user response:
1) There is no filter
method for numeric vectors. Subscript Value
as shown instead:
library(dplyr)
df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[Date1 >= Date2-14 & Date1 < Date2])) %>%
ungroup
giving:
# A tibble: 14 × 4
Date1 Date2 Value CondGrpMean
<date> <date> <int> <dbl>
1 2022-08-01 2022-08-15 1 1.5
2 2022-08-08 2022-08-15 2 1.5
3 2022-08-15 2022-08-15 3 1.5
4 2022-08-22 2022-08-15 4 1.5
5 2022-08-29 2022-08-15 5 1.5
6 2022-09-05 2022-08-15 6 1.5
7 2022-09-12 2022-08-15 7 1.5
8 2022-08-01 2022-08-29 8 10.5
9 2022-08-08 2022-08-29 9 10.5
10 2022-08-15 2022-08-29 10 10.5
11 2022-08-22 2022-08-29 11 10.5
12 2022-08-29 2022-08-29 12 10.5
13 2022-09-05 2022-08-29 13 10.5
14 2022-09-12 2022-08-29 14 10.5
1a) A variation of this is:
df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(Value[c(Date2 - Date1) %in% 1:14])) %>%
ungroup
2) With base R:
Mean <- function(ix) with(df[ix, ], mean(Value[Date1 >= Date2-14 & Date1 < Date2]))
transform(df, CondGrpMean = ave(1:nrow(df), Date2, FUN = Mean))
CodePudding user response:
Add your condition as a column and use it to subset the Value
column that you take the mean
of:
library(lubridate)
df %>%
group_by(Date2) %>%
mutate(
cond = Date1 < Date2 & Date1 >= (Date2 - weeks(2)),
CondGrpMean = mean(Value[cond])
)
# # A tibble: 14 × 5
# # Groups: Date2 [2]
# Date1 Date2 Value cond CondGrpMean
# <date> <date> <int> <lgl> <dbl>
# 1 2022-08-01 2022-08-15 1 TRUE 1.5
# 2 2022-08-08 2022-08-15 2 TRUE 1.5
# 3 2022-08-15 2022-08-15 3 FALSE 1.5
# 4 2022-08-22 2022-08-15 4 FALSE 1.5
# 5 2022-08-29 2022-08-15 5 FALSE 1.5
# 6 2022-09-05 2022-08-15 6 FALSE 1.5
# 7 2022-09-12 2022-08-15 7 FALSE 1.5
# 8 2022-08-01 2022-08-29 8 FALSE 10.5
# 9 2022-08-08 2022-08-29 9 FALSE 10.5
# 10 2022-08-15 2022-08-29 10 TRUE 10.5
# 11 2022-08-22 2022-08-29 11 TRUE 10.5
# 12 2022-08-29 2022-08-29 12 FALSE 10.5
# 13 2022-09-05 2022-08-29 13 FALSE 10.5
# 14 2022-09-12 2022-08-29 14 FALSE 10.5
CodePudding user response:
I think you are looking for something like this. You can deal with your conditions using ifelse
and only include the Value
you want by assigning others NA
. Then use na.rm
in your mean
to get the correct values:
library(dplyr)
df <- data.frame(
Date1 = rep(seq.Date(from = as.Date("2022-08-01"), to = as.Date("2022-09-12"), by = 7), 2),
Date2 = as.Date(c(rep("2022-08-15", times = 7), rep("2022-08-29", times = 7))),
Value = 1:14)
df %>%
group_by(Date2) %>%
mutate(CondGrpMean = mean(ifelse(Date1 < Date2 & Date1 >= Date2 - 14,
Value,
NA), na.rm = T
)
)
#> # A tibble: 14 × 4
#> # Groups: Date2 [2]
#> Date1 Date2 Value CondGrpMean
#> <date> <date> <int> <dbl>
#> 1 2022-08-01 2022-08-15 1 1.5
#> 2 2022-08-08 2022-08-15 2 1.5
#> 3 2022-08-15 2022-08-15 3 1.5
#> 4 2022-08-22 2022-08-15 4 1.5
#> 5 2022-08-29 2022-08-15 5 1.5
#> 6 2022-09-05 2022-08-15 6 1.5
#> 7 2022-09-12 2022-08-15 7 1.5
#> 8 2022-08-01 2022-08-29 8 10.5
#> 9 2022-08-08 2022-08-29 9 10.5
#> 10 2022-08-15 2022-08-29 10 10.5
#> 11 2022-08-22 2022-08-29 11 10.5
#> 12 2022-08-29 2022-08-29 12 10.5
#> 13 2022-09-05 2022-08-29 13 10.5
#> 14 2022-09-12 2022-08-29 14 10.5