library(tidyverse)
set.seed(10)
dat <- data.frame(age = sample(14:79, size = 15, replace = TRUE),
sex = sample(c("m", "f"), size = 15, replace = TRUE),
region = sample(c("A", "B", "C"), size = 15, replace = TRUE),
var1 = runif(15, min = 0, max=100)) %>%
mutate(agegrp = cut(age, breaks = c(-Inf, 20, 50, 70, Inf), labels = c("<= 20", "21-50", "51-70", ">70")))
The table looks like this:
age sex region var1 agegrp
1 79 m A 64.669975 >70
2 78 m C 92.789062 >70
3 23 m A 69.626845 21-50
4 25 m C 5.074013 21-50
5 60 f C 10.340510 51-70
6 36 f B 90.294240 21-50
7 23 m A 12.769088 21-50
8 27 f A 43.892321 21-50
9 35 f B 99.793467 21-50
10 40 f C 94.284903 21-50
11 25 m A 98.829001 21-50
12 55 m A 98.007185 51-70
13 43 f A 37.491168 21-50
14 68 m A 90.051414 51-70
15 76 f B 13.567239 >70
Unfortunately, our customer needs the data to be in a pretty weird format like this:
split value var1_mean
1 agegrp 21-50 61.3
2 agegrp 51-70 66.1
3 agegrp >70 57.0
4 sex m 55.7
5 sex f 66.5
6 region A 64.4
7 region B 67.9
8 region C 50.6
I can easily do this using the following code, but it is very unelegant:
age <- dat %>% group_by(agegrp) %>%
summarise(var1_mean = mean(var1)) %>%
mutate(value = agegrp,
split = "agegrp") %>%
select(split, value, var1_mean)
sex <- dat %>% group_by(sex) %>%
summarise(var1_mean = mean(var1)) %>%
mutate(value = "sex",
split = "sex") %>%
select(split, value, var1_mean)
region <- dat %>% group_by(region) %>%
summarise(var1_mean = mean(var1)) %>%
mutate(value = "region",
split = "region") %>%
select(split, value, var1_mean)
rbind(age, sex, region)
Is there a way to make this easier without "stacking" several tables manually (maybe using dplyr)?
CodePudding user response:
You can pivot to long and summarise or, alternatively, iterate over the vars of interest:
library(dplyr)
library(purrr)
library(tidyr)
dat %>%
pivot_longer(-c(var1, age), names_to = "split") %>%
group_by(split, value) %>%
summarise(var1_mean = mean(var1))
Or:
map_df(set_names(c("agegrp", "sex", "region")), ~ dat %>%
group_by(across(.x)) %>%
summarise(var1_mean = mean(var1)) %>%
rename(value = .x), .id = "split")
# A tibble: 8 × 3
split value var1_mean
<chr> <chr> <dbl>
1 agegrp <= 20 54.6
2 agegrp 21-50 44.7
3 agegrp 51-70 46.4
4 sex f 37.4
5 sex m 55.5
6 region A 67.3
7 region B 47.9
8 region C 26.1
CodePudding user response:
Loop through columns and aggregate, then rowbind the results:
do.call(rbind,
lapply(c("agegrp", "sex", "region"), function(i){
cbind(split = i,
setNames(aggregate(as.formula(paste("age ~", i)), mean, data = dat),
c("value", "var1_mean")))
}))
# split value var1_mean
# 1 agegrp <= 20 20.00000
# 2 agegrp 21-50 33.30000
# 3 agegrp 51-70 60.50000
# 4 sex f 39.62500
# 5 sex m 39.71429
# 6 region A 35.75000
# 7 region B 39.00000
# 8 region C 43.60000
Note: output is different, probably because the seeded data and shown data do not match.