I would like to combine outcomes for participant groups, whilst keeping the separate levels. For example, I am aggregating data across multiple columns using:
aggregate(list(x$variable1, x$variable 2, x$variable 3), by = list (x$date, x$site, x$participanttype), FUN=mean, na-rm = TRUE)
There are 2 levels to participant type (SU and staff). The above gives me the aggregated outcomes for SU and staff but I also want a total that combines SU and staff. I've seen the following; however, they are only summing one variable (Y) and I have hundreds. It would be open to too many errors if I had to input them all - never mind the time:
Data %>%
add_row(X = "Total", Y = sum(.$Y)) %>%
mutate(X = factor(X))
How can I retain outcomes for SU and staff whilst having an additional 'total' output? I include a minimal dataset below.
Data<-data.frame(
X = factor(c("SU", "Staff")),
V1 = c(1000, 2000, 3000, 4000))
V2 = c(4000, 3000, 9000, 1000))
Thank you.
CodePudding user response:
Here is a solution using dplyr::summarize()
instead of aggregate()
.
library(tidyverse)
Data <- data.frame(
X = factor(c("SU", "Staff")),
V1 = c(1000, 2000, 3000, 4000),
V2 = c(4000, 3000, 9000, 1000)
)
Data %>%
group_by(X) %>%
summarize_all(sum) %>%
bind_rows(prepend(summarize_if(., is.numeric, sum), list(X = "total")))
#> # A tibble: 3 × 3
#> X V1 V2
#> <chr> <dbl> <dbl>
#> 1 Staff 6000 4000
#> 2 SU 4000 13000
#> 3 total 10000 17000
Created on 2022-03-21 by the reprex package (v2.0.1)
CodePudding user response:
Since you had written that you wanted to keep the original values and collect the sum values, I thought I would add to the great information you were already given. However, regardless of how many columns you have, this works. You simply have to select the range of columns by selecting the first and last columns to aggregate. That's shown here as V1:V2
. It is odd here with only two columns, but this is dynamic and can accommodate any number of columns.
I'm going to guess that there is a better way of doing this, but this works.
Data %>%
rowid_to_column() %>% # original order; perception of duplicates
pivot_longer(cols = c(V1:V2),
names_to = "name", # now all groups are in one column
values_to = "values") %>%
group_by(X, name) %>%
summarise(id = rowid,
Tot = sum(values), # collect the total for all groups
values = values,
.groups = "drop") %>%
pivot_wider(names_from = name,
values_from = c(Tot, values),
names_glue = "{name}_{.value}") %>% # groups to columns
arrange(id) %>% select(-id) # return to original order
# # A tibble: 4 × 5
# X V1_Tot V2_Tot V1_values V2_values
# <fct> <dbl> <dbl> <dbl> <dbl>
# 1 SU 4000 13000 1000 4000
# 2 Staff 6000 4000 2000 3000
# 3 SU 4000 13000 3000 9000
# 4 Staff 6000 4000 4000 1000
Data # original data frame
# X V1 V2
# 1 SU 1000 4000
# 2 Staff 2000 3000
# 3 SU 3000 9000
# 4 Staff 4000 1000