Home > Mobile >  How to only select the top N groups based on total count to plot using ggplot in R
How to only select the top N groups based on total count to plot using ggplot in R

Time:07-26

I have a graph that shows the distinct tencode groups (factors) and their respective counts across each year. Reference image here (ignore bad axis labels):

enter image description here

The goal is to only show the Top N "Tencodes" based on the total count you see here instead of showing all - then ordering in descending order based on total count for each year to only show relevant groups with most impact.

Here is my code I'm trying but I keep getting an error:

complete_df %>%
  count(Tencode.Description, sort = TRUE) %>%
  head(10) %>%
  na.omit() %>%
  mutate(Tencode.Description = fct_reorder(Tencode.Description, n)) %>%
  ggplot(aes(Year, fill = Tencode.Description))  
  geom_bar(position=position_dodge())  
  scale_y_log10(labels = comma)  
  #facet_wrap(~ Year)  
  labs(fill = "Tencode")   theme(
    axis.title.x = element_text(color="black", size=15, face="bold"),
    axis.title.y = element_text(color="black", size=15, face="bold"),
    legend.title = element_text(colour = "black", size = 10, face = "bold"),
    legend.text = element_text(colour = "black", size = 10, face = "plain"),
    axis.text.x = element_text(colour = "black", size = 10, face = "plain"),
    axis.text.y = element_text(colour = "black", size = 10, face = "plain"))  
  labs( x="Year", y = "Count")

Any help would be greatly appreciated - Here is a summary of my raw data as well



CodePudding user response:

What you need to do is identify top N before graph and filter the data that feed into ggplot. Here is an example with some random data and filtered by top 10

library(dplyr)
library(ggplot2)
library(scales)

# create a sample data with runif for count figures 
set.seed(100)
sample_code <- c("Bom Threat", "Burglary - Non-Residence", "Burglary - Residence",
                 "Community Policing Activity", "Corpse / D. O. A.",
                 "Cutting / Stabbing", "Dangerous / Injured Animal",
                 "Disorderly Person", "Drowning", "Fight / Assault",
                 "Fire", "Hold up / Robbery", "Intoxicated Person", "Theft",
                 "Transport Prisoner / Suspect", "Vehicle Accident",
                 "Missing Person", "Person Indecently Exposed")
code_data <- tibble(
  Year = sort(rep(seq(2015, 2022, by = 1), 18)),
  code = rep(sample_code, 8),
  count = round(runif(18 * 8, 1, 100), digits = 0)
)

# identify top 10 code of all times
top_10_overall <- code_data %>%
  group_by(code) %>%
  summarize(total_count = sum(count), .groups = "drop") %>%
  arrange(desc(total_count)) %>%
  head(10)
top_ten_code <- factor(top_10_overall$code)

# filter data with top 10 codes and convert to factor
to_graph_data <- code_data %>%
  filter(code %in% as.character(top_ten_code)) %>%
  mutate(code = factor(code, levels = levels(top_ten_code)))

# plot the filtered data
ggplot(data = to_graph_data, aes(fill = code))  
  geom_bar(aes(x= Year, y = count), stat = "identity", position=position_dodge())  
  labs(fill = "Tencode")   theme(
    axis.title.x = element_text(color="black", size=15, face="bold"),
    axis.title.y = element_text(color="black", size=15, face="bold"),
    legend.title = element_text(colour = "black", size = 10, face = "bold"),
    legend.text = element_text(colour = "black", size = 10, face = "plain"),
    axis.text.x = element_text(colour = "black", size = 10, face = "plain"),
    axis.text.y = element_text(colour = "black", size = 10, face = "plain"))  
  labs( x="Year", y = "Count")  
  scale_y_continuous(expand = c(0, 0)) 
  guides(fill = guide_legend(reverse = TRUE))

Created on 2022-07-25 by the reprex package (v2.0.1)

  • Related