I am not able to understand exactly how this code works. I have found it on a tutorial guide:
Data manipulation in R - Steph Locke
on page 133 an example that I am able to understand only partially.
library(tidyverse)
library(nycflights13)
flights %>%
group_by(month, carrier) %>%
summarise(n=n()) %>% ##sum of items;
group_by(month) %>%
mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
spread(month, prop)
flights %>%
group_by(month, carrier) %>% ## This is grouping by months and within the months by carrier;
summarise(n=n()) %>% ## It is summing the items, giving for each month and each carrier the sum of items;
At this point there in another group_by()
, it looks like a nested to group_by(month, carrier)
Then:
mutate(prop=scales::percent(n/sum(n)), n=NULL) %>% ## Calculates the percentage of items over the total and store them in "prop"
Last line it creates the matrix, putting in the columns month
and inside the value obtained from prop
I would like to understand better what is doing exactly the second group_by(month) %>%
Thank you in advance for every reply.
CodePudding user response:
The second group_by
is not needed here as by default summarise
step argument .groups = "drop_last"
. Therefore, after the first summarise
, there is only a single grouping column i.e. 'month' remains. We can change the code to
flights %>%
group_by(month, carrier) %>%
summarise(n=n()) %>%
mutate(prop=scales::percent(n/sum(n)), n=NULL)
Suppose, we change the default value in .groups
to "drop", then, it will drop all the grouping variables, and thus a new group_by statement is needed. Also, after the last grouping statement, if we are using mutate
, it wouldn't drop the group attributes and thus ungroup
would be useful
flights %>%
group_by(month, carrier) %>%
summarise(n=n(), .groups = "drop") %>%
group_by(month) %>%
mutate(prop=scales::percent(n/sum(n)), n=NULL) %>%
ungroup