I have a graph that shows the distinct tencode groups (factors) and their respective counts across each year. Reference image here (ignore bad axis labels):
The goal is to only show the Top N "Tencodes" based on the total count you see here instead of showing all - then ordering in descending order based on total count for each year to only show relevant groups with most impact.
Here is my code I'm trying but I keep getting an error:
complete_df %>%
count(Tencode.Description, sort = TRUE) %>%
head(10) %>%
na.omit() %>%
mutate(Tencode.Description = fct_reorder(Tencode.Description, n)) %>%
ggplot(aes(Year, fill = Tencode.Description))
geom_bar(position=position_dodge())
scale_y_log10(labels = comma)
#facet_wrap(~ Year)
labs(fill = "Tencode") theme(
axis.title.x = element_text(color="black", size=15, face="bold"),
axis.title.y = element_text(color="black", size=15, face="bold"),
legend.title = element_text(colour = "black", size = 10, face = "bold"),
legend.text = element_text(colour = "black", size = 10, face = "plain"),
axis.text.x = element_text(colour = "black", size = 10, face = "plain"),
axis.text.y = element_text(colour = "black", size = 10, face = "plain"))
labs( x="Year", y = "Count")
Any help would be greatly appreciated - Here is a summary of my raw data as well
CodePudding user response:
What you need to do is identify top N
before graph and filter the data that feed into ggplot. Here is an example with some random data and filtered by top 10
library(dplyr)
library(ggplot2)
library(scales)
# create a sample data with runif for count figures
set.seed(100)
sample_code <- c("Bom Threat", "Burglary - Non-Residence", "Burglary - Residence",
"Community Policing Activity", "Corpse / D. O. A.",
"Cutting / Stabbing", "Dangerous / Injured Animal",
"Disorderly Person", "Drowning", "Fight / Assault",
"Fire", "Hold up / Robbery", "Intoxicated Person", "Theft",
"Transport Prisoner / Suspect", "Vehicle Accident",
"Missing Person", "Person Indecently Exposed")
code_data <- tibble(
Year = sort(rep(seq(2015, 2022, by = 1), 18)),
code = rep(sample_code, 8),
count = round(runif(18 * 8, 1, 100), digits = 0)
)
# identify top 10 code of all times
top_10_overall <- code_data %>%
group_by(code) %>%
summarize(total_count = sum(count), .groups = "drop") %>%
arrange(desc(total_count)) %>%
head(10)
top_ten_code <- factor(top_10_overall$code)
# filter data with top 10 codes and convert to factor
to_graph_data <- code_data %>%
filter(code %in% as.character(top_ten_code)) %>%
mutate(code = factor(code, levels = levels(top_ten_code)))
# plot the filtered data
ggplot(data = to_graph_data, aes(fill = code))
geom_bar(aes(x= Year, y = count), stat = "identity", position=position_dodge())
labs(fill = "Tencode") theme(
axis.title.x = element_text(color="black", size=15, face="bold"),
axis.title.y = element_text(color="black", size=15, face="bold"),
legend.title = element_text(colour = "black", size = 10, face = "bold"),
legend.text = element_text(colour = "black", size = 10, face = "plain"),
axis.text.x = element_text(colour = "black", size = 10, face = "plain"),
axis.text.y = element_text(colour = "black", size = 10, face = "plain"))
labs( x="Year", y = "Count")
scale_y_continuous(expand = c(0, 0))
guides(fill = guide_legend(reverse = TRUE))
Created on 2022-07-25 by the reprex package (v2.0.1)