I am using the dplyr package. Let's suppose I have the below table.
Group | count |
---|---|
A | 20 |
A | 10 |
B | 30 |
B | 35 |
C | 50 |
C | 60 |
My goal is to create a summary table that contains the mean per each group, and also, the percentage of the mean of each group compared to the total means added together. So the final table will look like this:
Group | avg | prcnt_of_total |
---|---|---|
A | 15 | .14 |
B | 32.5 | .31 |
C | 55 | .53 |
For example, 0.14 is the result of the following calculation: 15/(15 32.5 55)
Right now, I was only able to produce the first column code that calculates the mean for each group:
summary_df<- df %>%
group_by(Group)%>%
summarise(avg=mean(count))
I still don't know how to produce the prcnt_of_total column. Any suggestions?
CodePudding user response:
You can use the following code:
df <- read.table(text="Group count
A 20
A 10
B 30
B 35
C 50
C 60", header = TRUE)
library(dplyr)
df %>%
group_by(Group) %>%
summarise(avg = mean(count)) %>%
ungroup() %>%
mutate(prcnt_of_total = prop.table(avg))
#> # A tibble: 3 × 3
#> Group avg prcnt_of_total
#> <chr> <dbl> <dbl>
#> 1 A 15 0.146
#> 2 B 32.5 0.317
#> 3 C 55 0.537
Created on 2022-07-14 by the reprex package (v2.0.1)
CodePudding user response:
We can drop the group in summarise
itself.
library(dplyr)
df1 %>%
group_by(Group) %>%
summarise(avg = mean(count), .groups = "drop") %>%
mutate(prcnt_of_total = avg/sum(avg))
#> # A tibble: 3 x 3
#> Group avg prcnt_of_total
#> <chr> <dbl> <dbl>
#> 1 A 15 0.146
#> 2 B 32.5 0.317
#> 3 C 55 0.537
On another note, I am not sure if getting the average divided by the sum of averages is a meaningful metric unless we are sure to have the same number of entries per group. Given that, I suggested another solution as well.
## if you always have the same number of rows between the groups
df1 %>%
group_by(Group) %>%
summarise(avg = mean(count),
prcnt_of_total = sum(count)/sum(.$count))
#> # A tibble: 3 x 3
#> Group avg prcnt_of_total
#> <chr> <dbl> <dbl>
#> 1 A 15 0.146
#> 2 B 32.5 0.317
#> 3 C 55 0.537
Data:
read.table(text = "Group count
A 20
A 10
B 30
B 35
C 50
C 60",
header = T, stringsAsFactors = F) -> df1
CodePudding user response:
You can do this:
df %>%
group_by(Group) %>%
summarize(avg = mean(count), prcent_of_total = sum(count)/sum(df$count))
Output:
Group avg prcent_of_total
<chr> <dbl> <dbl>
1 A 15 0.146
2 B 32.5 0.317
3 C 55 0.537
data.table
is similar:
library(data.table)
setDT(df)[,.(avg = mean(count), prcent_of_total = sum(count)/sum(df$count)),Group]