Having trouble grouping by two variables and having a correct descending order.
I'm using a modified Iris df for this, This is my script:
iris_new <- iris %>%
mutate(number_petals = sample(1:10, size = 150, replace = TRUE))
iris_new %>%
group_by(number_petals, Species) %>%
summarise(n=sum(Petal.Length, na.rm=TRUE)) %>%
arrange(desc(n), by_group = TRUE) %>%
head(25) %>%
ggplot(aes(x=reorder(number_petals,n),y=n,fill=factor(Species)))
xlab("Number of Petals")
ylab("Total sum of petal lenghts")
geom_col() # coord_flip()
Two issues with this:
It does not sort by descending ever since I added the second group_by argument (Species).
Head(25) doesn't take the 25 highest sum of petal lengths per number_petals and per species. It takes the 25 highest petal lengths, agnostic of number_petals and Species.
I read that summarise() removes the second group_by variable, but I am not sure what to do with that information.
All help is greatly appreciated!
CodePudding user response:
Here is one way using factor
ed approach -
We create two new columns n
and n1
where n1
is sum
of n
values in each number_petals
. arrange
the data by n1
and create factor levels according to their occurrence. This factor
level will ensure the order of the bars.
library(dplyr)
library(ggplot2)
iris_new %>%
group_by(Species, number_petals) %>%
summarise(n=sum(Petal.Length, na.rm=TRUE), .groups = "drop") %>%
group_by(number_petals) %>%
mutate(n1 = sum(n)) %>%
arrange(desc(n1)) %>%
ungroup() %>%
mutate(number_petals = factor(number_petals, unique(number_petals))) %>%
ggplot(aes(x=number_petals,y=n,fill=Species))
xlab("Number of Petals")
ylab("Total sum of petal lenghts")
geom_col()
head(25)
would select first 25 rows irrespective of the grouping. If you want to select top rows by group looking into ?slice_max
or ?slice
in general.
To select top n
(5 here) values here's a different approach using a join.
iris_new %>%
group_by(number_petals) %>%
summarise(n=sum(Petal.Length, na.rm=TRUE), .groups = "drop") %>%
slice_max(n, n = 5) %>%
inner_join(iris_new %>%
group_by(Species, number_petals) %>%
summarise(n1=sum(Petal.Length, na.rm=TRUE), .groups = "drop"),
by = 'number_petals') %>%
arrange(desc(n)) %>%
mutate(number_petals = factor(number_petals, unique(number_petals))) %>%
ggplot(aes(x=number_petals,y=n1,fill=Species))
xlab("Number of Petals")
ylab("Total sum of petal lenghts")
geom_col()