Let's say I have the R dataframe
major = c("math","math", "math", "econ", "econ", "econ", "chem","chem", "chem")
name = c("bob1", "bob2", "bob3", "carl1", "carl2", "carl3", "carl4","dave", "emily")
test1 = c(96,87,67,93,99,100,72,65,59)
test2 = c(46,78,90,95,76,85,67,99,91)
df = data.frame(name, major, test1, test2)
df
Which looks like:
name major test1 test2
1 bob1 math 96 46
2 bob2 math 87 78
3 bob3 math 67 90
4 carl1 econ 93 95
5 carl2 econ 99 76
6 carl3 econ 100 85
7 carl4 chem 72 67
8 dave chem 65 99
9 emily chem 59 91
I want to group by major
and then for each major, compute various metrics and output them all in one column instead of a separate column for each. eg
major statistic test1 test2
1 math Weighted mean [calculated score] [calculated Score]
2 math median [calculated score] [calculated Score]
3 econ Weighted mean [calculated score] [calculated Score]
4 econ median [calculated score] [calculated Score]
5 chem Weighted mean [calculated score] [calculated Score]
6 chem median [calculated score] [calculated Score]
I know one option would be to make a dataframe for just statistc = 'Weighted mean'
and one for statistc = 'median'
and then to take the union of the two, but I want to know if there is a more direct way that will generalize to more than two statistics.
CodePudding user response:
You can create a named vector and reference the names in the other column
df |>
group_by(major) |>
summarise(score = c(median = median(score), mean = mean(score)), statistic= names(score)) |>
ungroup()
major score statistic
<chr> <dbl> <chr>
1 chem 4 median
2 chem 3.5 mean
3 econ 2.7 median
4 econ 2.87 mean
5 math 3.4 median
6 math 3.37 mean
For multiple test scores:
df |>
group_by(major) |>
summarise(across(starts_with("test"), \(x) c(median = median(x), mean = mean(x))),
statistic = names(test1)) |>
ungroup()
major test1 test2 statistic
<chr> <dbl> <dbl> <chr>
1 chem 65 91 median
2 chem 65.3 85.7 mean
3 econ 99 85 median
4 econ 97.3 85.3 mean
5 math 87 78 median
6 math 83.3 71.3 mean