Background: I have data from a simulation where I have a few variables and thus many resulting combinations of parameters. Due to the internal design of the simulation there can be a little variation among the outcomes of identical sets of parameters, so I run a number of identical runs, then calculate their min, max, and mean score. Then, I want to compare the treatment and no-treatment conditions:
- calculate the mean of treatment minus no-treatment
- calculate the difference of the min score of treatment minus max score of no-treatment
- calculate the difference of the max score of treatment minus min score of no-treatment
This gives me the mean difference but also the bounds of the best- and worst-case comparison.
Example data:
my_data <- tribble(
~params, ~treatment, ~mean_score, ~min_score, ~max_score,
"combo a", 0, 91, 90, 92,
"combo a", 1, 92, 92, 92,
"combo b", 0, 89, 87, 91,
"combo b", 1, 92, 89, 92,
"combo c", 0, 90, 90, 90,
"combo c", 1, 89, 85, 93,
)
Blowing the dust off my R skills, my initial attempt is the following, but I do not know how to tell summarize which row should be subtracted from which within the grouping.
Code attempt I know doesn't work:
my_summ_data <- mydata %>%
dplyr::group_by(params = as.factor(params)) %>%
dplyr::summarize(hier_diff=diff(mean_score),
min_max_diff=diff(c(min_score, max_score)),
max_min_diff=diff(c(max_score, min_score)) )
I would like to get
params | hier_diff | min_max_diff | max_min_diff |
---|---|---|---|
combo a | 1 | 0 | 2 |
combo b | 3 | -2 | 5 |
combo c | -1 | -5 | 3 |
but instead I get (btw I don't yet understand why I get these extra rows)
params | hier_diff | min_max_diff | max_min_diff |
---|---|---|---|
combo a | 1 | 2 | 0 |
combo a | 1 | 0 | -2 |
combo a | 1 | 0 | 2 |
combo b | 1 | 2 | 0 |
combo b | 1 | 2 | -4 |
combo b | 1 | 0 | 2 |
combo c | 2 | -2 | 6 |
combo c | 2 | 2 | -6 |
combo c | 2 | 6 | -2 |
I'm not convinced there is a sensible way to do what I want using summarize. But if there is, I would like to know it, and if not, what is the next best alternative?
CodePudding user response:
Please find below one possible solution.
Reprex
- Code
library(dplyr)
library(tibble)
my_summ_data <- my_data %>%
dplyr::group_by(params) %>%
dplyr::arrange(treatment) %>%
dplyr::summarize(hier_diff=diff(mean_score),
min_max_diff=diff(c(max_score[1], min_score[2])),
max_min_diff=diff(c(min_score[1], max_score[2])))
- Output
my_summ_data
#> # A tibble: 3 x 4
#> params hier_diff min_max_diff max_min_diff
#> <chr> <dbl> <dbl> <dbl>
#> 1 combo a 1 0 2
#> 2 combo b 3 -2 5
#> 3 combo c -1 -5 3
Created on 2022-02-14 by the reprex package (v2.0.1)
CodePudding user response:
my_data %>%
dplyr::group_by(params = as.factor(params)) %>%
dplyr::summarize(
hier_diff= mean_score[treatment==1] - mean_score[treatment==0],
min_max_diff=min(min_score[treatment==1]) - max(max_score[treatment==0]),
max_min_diff=max(max_score[treatment==1]) - min(min_score[treatment==0])
)
Result
# A tibble: 3 x 4
params hier_diff min_max_diff max_min_diff
<fct> <dbl> <dbl> <dbl>
1 combo a 1 0 2
2 combo b 3 -2 5
3 combo c -1 -5 3
Note that the answer is the same even if the treatment rows appear appear prior to the no-treatment rows, eg:
my_data <- tribble(
~params, ~treatment, ~mean_score, ~min_score, ~max_score,
"combo a", 1, 92, 92, 92, # swapped rows 1 2, 3 4, 5 6
"combo a", 0, 91, 90, 92,
"combo b", 1, 92, 89, 92,
"combo b", 0, 89, 87, 91,
"combo c", 1, 89, 85, 93,
"combo c", 0, 90, 90, 90,
)