I have an R dataframe like so:
TS Wafer(1)Radius(38) Wafer(1)Radius(49) Wafer(2)Radius(06) Wafer(2)Radius(38)
2022-06-29T02:33:34.827582 699.565925 699.726346 700.526022 700.820165
2022-06-29T02:33:42.277582 700.053016 700.081032 700.558847 700.367924
2022-06-29T02:33:49.727582 699.707056 699.761278 700.485359 700.958339
2022-06-29T02:33:57.169972 699.909119 700.04703 700.419578 700.565945
2022-06-29T02:34:04.617582 699.74137 699.698093 700.397555 700.533587
2022-06-29T02:34:12.067582 700.106121 699.868698 700.573662 700.611651
I would like to get the min/max/mean for each Wafer grouped by each row.
For example, for the first row/TS, the max for Wafer(1)
would be 699.726346
. The max for Wafer(2)
would be 700.820165
. This would continue down the rows for each TS
.
I know there is a pmax()
function that can find the max/min across multiple columns but how do I group it to wafer1, wafer 2 etc?
Also I would like to apply other statistics such as mean median etc.
Is this possible?
Expected result would be something like:
TS Wafer(1)Radius(38) Wafer(1)Radius(49) Wafer(2)Radius(06) Wafer(2)Radius(38) Wafer1_Max wafer2_max wafer1_min wafer2_min wafer1_mean wafer2_mean
2022-06-29T02:33:34.827582 699.565925 699.726346 700.526022 700.820165 699.726346 700.820165 699.565925 700.526022 699.6461355 700.6730935
2022-06-29T02:33:42.277582 700.053016 700.081032 700.558847 700.367924 700.081032 700.558847 700.053016 700.367924 700.067024 700.4633855
2022-06-29T02:33:49.727582 699.707056 699.761278 700.485359 700.958339 699.761278 700.958339 699.707056 700.485359 699.734167 700.721849
2022-06-29T02:33:57.169972 699.909119 700.04703 700.419578 700.565945 700.04703 700.565945 699.909119 700.419578 699.9780745 700.4927615
2022-06-29T02:34:04.617582 699.74137 699.698093 700.397555 700.533587 699.74137 700.533587 699.698093 700.397555 699.7197315 700.465571
2022-06-29T02:34:12.067582 700.106121 699.868698 700.573662 700.611651 700.106121 700.611651 699.868698 700.573662 699.9874095 700.5926565
CodePudding user response:
If we pivot your data to a long format, this gets much easier as we can use standard grouping functions:
library(tidyr)
library(dplyr)
df %>%
pivot_longer(
-TS,
names_pattern = "([0-9] ).*([0-9] )",
names_to = c("Wafer", "Radius"),
values_to = "value"
) %>%
group_by(TS, Wafer) %>%
summarize(
max = max(value),
min = min(value),
mean = mean(value),
median = median(value),
.groups = "drop"
) %>%
as.data.frame
# TS Wafer max min mean median
# 1 2022-06-29T02:33:34.827582 1 699.7263 699.5659 699.6461 699.6461
# 2 2022-06-29T02:33:34.827582 2 700.8202 700.5260 700.6731 700.6731
# 3 2022-06-29T02:33:42.277582 1 700.0810 700.0530 700.0670 700.0670
# 4 2022-06-29T02:33:42.277582 2 700.5588 700.3679 700.4634 700.4634
# 5 2022-06-29T02:33:49.727582 1 699.7613 699.7071 699.7342 699.7342
# 6 2022-06-29T02:33:49.727582 2 700.9583 700.4854 700.7218 700.7218
# 7 2022-06-29T02:33:57.169972 1 700.0470 699.9091 699.9781 699.9781
# 8 2022-06-29T02:33:57.169972 2 700.5659 700.4196 700.4928 700.4928
# 9 2022-06-29T02:34:04.617582 1 699.7414 699.6981 699.7197 699.7197
# 10 2022-06-29T02:34:04.617582 2 700.5336 700.3976 700.4656 700.4656
# 11 2022-06-29T02:34:12.067582 1 700.1061 699.8687 699.9874 699.9874
# 12 2022-06-29T02:34:12.067582 2 700.6117 700.5737 700.5927 700.5927
To get back to wide format you can use this:
... %>%
pivot_wider(
names_from = "Wafer",
values_from = c("max", "min", "mean", "median"),
names_glue = "Wafer({Wafer})_{.value}"
)
# # A tibble: 6 × 9
# TS `Wafer(1)_max` Wafer(2)_ma…¹ Wafer…² Wafer…³ Wafer…⁴ Wafer…⁵ Wafer…⁶ Wafer…⁷
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 2022-06-29T02:33:34.827582 700. 701. 700. 701. 700. 701. 700. 701.
# 2 2022-06-29T02:33:42.277582 700. 701. 700. 700. 700. 700. 700. 700.
# 3 2022-06-29T02:33:49.727582 700. 701. 700. 700. 700. 701. 700. 701.
# 4 2022-06-29T02:33:57.169972 700. 701. 700. 700. 700. 700. 700. 700.
# 5 2022-06-29T02:34:04.617582 700. 701. 700. 700. 700. 700. 700. 700.
# 6 2022-06-29T02:34:12.067582 700. 701. 700. 701. 700. 701. 700. 701.
# # … with abbreviated variable names ¹`Wafer(2)_max`, ²`Wafer(1)_min`, ³`Wafer(2)_min`, ⁴`Wafer(1)_mean`,
# # ⁵`Wafer(2)_mean`, ⁶`Wafer(1)_median`, ⁷`Wafer(2)_median`