I need to get the relative frequencies of a summarized column in R. I've used dplyr's summarize to find the total of each grouped row, like this:
data %>%
group_by(x) %>%
summarise(total = sum(dollars))
x total
<chr> <dbl>
1 expense 1 3600
2 expense 2 2150
3 expense 3 2000
But now I need to create a new column for the relative frequencies of each total row to get this result:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 46.45%
2 expense 2 2150 27.74%
3 expense 3 2000 25.81%
I've tried this:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = scales::percent(total/sum(total))
and this:
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/sum(total)*100)
but the result is always this:
x total p
<chr> <dbl> <dbl>
1 expense 1 3600 100%
2 expense 2 2150 100%
3 expense 3 2000 100%
The problem seems to be the summarized total column that may be affecting the results. Any ideas to help me? Thanks
CodePudding user response:
You get 100% because of the grouping. However, after you've summarized, dplyr will drop the one level of grouping. Meaning that if you e.g. do mutate()
after, you get the results you need:
library(dplyr)
data <- tibble(
x = c("expense 1", "expense 2", "expense 3"),
dollars = c(3600L, 2150L, 2000L)
)
data %>%
group_by(x) %>%
summarise(total = sum(dollars)) %>%
mutate(p = total/sum(total)*100)
# A tibble: 3 x 3
x total p
<chr> <int> <dbl>
1 expense 1 3600 46.5
2 expense 2 2150 27.7
3 expense 3 2000 25.8
CodePudding user response:
You get 100% because it calculates the total of that particular group. You need to ungroup. Assuming that you want to divide by total entries just divide by nrow(df)
.
data %>%
group_by(x) %>%
summarise(total = sum(dollars), p = total/nrow(data)*100)
CodePudding user response:
After the first sum
, ungroup and create p
with mutate
.
iris %>%
group_by(Species) %>%
summarise(total = sum(Sepal.Length)) %>%
ungroup() %>%
mutate(p = total/sum(total)*100)
## A tibble: 3 x 3
# Species total p
# <fct> <dbl> <dbl>
#1 setosa 250. 28.6
#2 versicolor 297. 33.9
#3 virginica 329. 37.6