I have the following data:
structure(list(validated_1 = c("sombra", "sombra", "sombra",
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra",
"coscinodiscus", "sombra", "coscinodiscus", "coscinodiscus",
"sombra", "coscinodiscus", "sombra", "coscinodiscus", "sombra",
"coscinodiscus", "coscinodiscus", "detritos", "detritos", "coscinodiscus",
"appendicularia", "detritos", "coscinodiscus", "coscinodiscus",
"detritos", "coscinodiscus", "langanho", "detritos", "copepodo",
"langanho", "copepodo", "langanho", "langanho", "coscinodiscus",
"coscinodiscus", "coscinodiscus"), validated_2 = c("sombra",
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra",
"sombra", "sombra", "coscinodiscus", "sombra", "coscinodiscus",
"coscinodiscus", "sombra", "coscinodiscus", "sombra", "coscinodiscus",
"sombra", "coscinodiscus", "coscinodiscus", "detritos", "detritos",
"coscinodiscus", "zooplâncton", "detritos", "coscinodiscus",
"coscinodiscus", "detritos", "coscinodiscus", "langanho", "detritos",
"zooplâncton", "langanho", "zooplâncton", "langanho", "langanho",
"coscinodiscus", "coscinodiscus", "coscinodiscus")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -40L))
I work with data in this way and generate this graph:
df %>%
group_by(validated_1) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
mutate(groups = c(rep("high N", 2), rep("lower N", 4))) %>%
ggplot(aes(x = reorder(validated_1, -count), y = count))
geom_bar(stat = 'identity')
facet_wrap(~ groups, nrow = 2, scales = "free")
geom_text(aes(label = count), vjust = -0.5, size = 3)
In this way above, I would be able to facet by counts but not fill bars by groups in variable validated_2
.
Another way that I try was:
df %>%
ggplot(aes(x = fct_infreq(validated_1), fill = validated_2))
geom_bar()
In this way, I was able to fill the bars. However, I don't know how to facet data by count
and add the count above the bar.
Besides that, I note that this way is very slower than the first way (without the fill) for huge datasets (>10 million of rows).
Thanks all
CodePudding user response:
Add validate_2
to the group_by
so that it is still present in the dataset after summarizing and could e mapped on fill
. Also, you could simplify this step by switching to dplyr::count
:
library(dplyr)
library(ggplot2)
df %>%
count(validated_1, validated_2, sort = TRUE, name = "count") %>%
mutate(groups = c(rep("high N", 2), rep("lower N", 4))) %>%
ggplot(aes(x = reorder(validated_1, -count), y = count))
geom_col(aes(fill = validated_2))
facet_wrap(~groups, nrow = 2, scales = "free")
geom_text(aes(label = count), vjust = -0.5, size = 3)