I want to calculate the %'s of items within groups. For example, there are 2 groups and each contain 3 fruits. I want to know within each group, what are the proportions of fruit (i.e. each group should add up to 100%). I can achieve this using the below code but it feels too verbose. Can anyone suggests any improvements or a function that already exists to simplify it?
library(tidyverse)
#some data
fruit <- rep(c("apples", "oranges", "bananas"),
times=c(3, 2, 5))
group <- rep(c(1, 2), times=c(35, 65))
df <- data.frame(fruit, group, stringsAsFactors=FALSE)
#get %'s for each fruit within each group
df2 <- df %>%
#get numerator
group_by(fruit, group) %>%
summarise(`Total by fruit and group` = n()) %>%
#get denominator
left_join(df %>%
group_by(group) %>%
summarise(`Total by group` = n())) %>%
#work out % as numerator/denominator *100
mutate(`%`=`Total by fruit and group` / `Total by group`*100 ) %>%
select(fruit, group, `%`) %>%
arrange(group)
CodePudding user response:
Use the base R code.
res <- with(df, table(fruit, group)) |> proportions(margin=2) |>
as.data.frame.table() |> transform(Freq=Freq*100)
res
# fruit group Freq
# 1 apples 1 34.28571
# 2 bananas 1 42.85714
# 3 oranges 1 22.85714
# 4 apples 2 27.69231
# 5 bananas 2 53.84615
# 6 oranges 2 18.46154
"%"
as column name is discouraged. Use only characters, number (not at first position) and underscore if possible, no spaces.
Note: R >= 4.1 used.
Data:
df <- structure(list(fruit = c("apples", "apples", "apples", "oranges",
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas",
"apples", "apples", "apples", "oranges", "oranges", "bananas",
"bananas", "bananas", "bananas", "bananas", "apples", "apples",
"apples", "oranges", "oranges", "bananas", "bananas", "bananas",
"bananas", "bananas", "apples", "apples", "apples", "oranges",
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas",
"apples", "apples", "apples", "oranges", "oranges", "bananas",
"bananas", "bananas", "bananas", "bananas", "apples", "apples",
"apples", "oranges", "oranges", "bananas", "bananas", "bananas",
"bananas", "bananas", "apples", "apples", "apples", "oranges",
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas",
"apples", "apples", "apples", "oranges", "oranges", "bananas",
"bananas", "bananas", "bananas", "bananas", "apples", "apples",
"apples", "oranges", "oranges", "bananas", "bananas", "bananas",
"bananas", "bananas", "apples", "apples", "apples", "oranges",
"oranges", "bananas", "bananas", "bananas", "bananas", "bananas"
), group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
)), class = "data.frame", row.names = c(NA, -100L))
CodePudding user response:
Using dplyr
you could do:
Reprex
- Code
library(dplyr)
df %>%
group_by(group) %>%
count(fruit) %>%
mutate(freq = n / sum(n) * 100) %>%
select(-n)
- Output
#> # A tibble: 6 x 3
#> # Groups: group [2]
#> group fruit freq
#> <dbl> <chr> <dbl>
#> 1 1 apples 34.3
#> 2 1 bananas 42.9
#> 3 1 oranges 22.9
#> 4 2 apples 27.7
#> 5 2 bananas 53.8
#> 6 2 oranges 18.5
Created on 2022-02-19 by the reprex package (v2.0.1)
CodePudding user response:
Another possible solution, based on janitor::tabyl
and janitor::adorn_percentages
:
library(magrittr)
library(janitor)
df %>%
tabyl(group, fruit) %>%
adorn_percentages("row")
#> group apples bananas oranges
#> 1 0.3428571 0.4285714 0.2285714
#> 2 0.2769231 0.5384615 0.1846154