Home > OS >  Visualization of categorical variables (1D) with ggplot
Visualization of categorical variables (1D) with ggplot

Time:04-08

I have a collection of categorical data and I'm trying to figure out how best to visualize it. It is a "simple" list (97 categories long) with just a name and an associated value. Here's a sample to work with (in the actual set, the names are much longer):

names <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", 
"O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "AA", "AB", "AC", "AD", 
"AE", "AF", "AG", "AH", "AI", "AJ", "AK", "AL", "AM", "AN", "AO", "AP", "AQ", "AR", 
"AS", "AT", "AU", "AV", "AW", "AX", "AY", "AZ", "BA", "BB", "BC", "BD", "BE", "BF", 
"BG", "BH", "BI", "BJ", "BK", "BL", "BM", "BN", "BO", "BP", "BQ", "BR", "BS", "BT", 
"BU", "BV", "BW", "BX", "BY", "BZ", "CA", "CB", "CC", "CD", "CE", "CF", "CG", "CH", 
"CI", "CJ", "CK", "CL", "CM", "CN", "CO", "CP", "CQ", "CR", "CS")

cts <- c(620, 343, 165, 121, 107, 106, 104, 88, 83, 59, 57, 56, 49, 45, 44, 37, 37, 
37, 37, 35, 31, 31, 29, 27, 24, 23, 23, 22, 21, 21, 20, 20, 17, 17, 16, 16, 15, 15, 
15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 8, 8, 7, 6, 5, 5, 5, 5, 5, 
4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1)

testdat <- data.frame(names, cts)

My initial thought was to make a lollipop chart, but because there are so many points, R squishes them all together and it becomes a mess. Bar/column charts are out for the same reason. I thought it might be possible to make a plot in which all of the categories have a shape (a box?) with an area that corresponds to the cts variable, but I haven't found anything like that (I tried the waffle package, but I'm struggling with it). Everything that I've found seems to require at least two numerical values to plot in an x vs y way.

My next data set is in the same format, only with many more categories (9,351 instead of "just" 97), so I'm hoping for something I could expand.

Anyone have ideas on how to look at this data without cutting it up?

CodePudding user response:

Maybe a treemap?

library(ggplot2)
library(treemapify)

ggplot(testdat, aes(area = cts, fill = names))  
    geom_treemap()  
    geom_treemap_text(aes(label = names), place = 'center')  
    scale_fill_discrete(guide = 'none', limits = sample(testdat$names))

enter image description here

CodePudding user response:

There are two problems here. One is that a plot with 100 labelled categories is always going to be too busy. It might be better to plot the sorted values on a numeric axis, and label some important illustrative categories:

library(tidyverse)
library(ggrepel)

set.seed(1) 

plot_dat <- testdat %>%
  arrange(cts) %>%
  mutate(num = seq(nrow(.)))

  ggplot(plot_dat, aes(num, cts))  
  geom_point(colour = "deepskyblue4")  
  geom_label_repel(data = plot_dat[sample(nrow(plot_dat), 5), ],
                   aes(label = names), nudge_y = 0.5)  
  scale_y_log10()  
  theme_minimal()  
  ggtitle("Ordered cts values (log scale)")  
  labs(x = "")  
  theme(text = element_text(size = 16))

enter image description here

The other problem is filling 2d space with 1d information. You can do this with a treemap, or with packing circles:

library(packcircles)
library(ggplot2)
library(ggforce)

testdat <- cbind(testdat, circleRepelLayout(testdat$cts)$layout)

ggplot(testdat, aes(x0 = x, y0 = y, fill = radius))  
  geom_circle(aes(r = radius))  
  geom_text(aes(x, y, label = names, size = order(radius)))  
  coord_equal()  
  theme_void()  
  scale_fill_distiller(palette = "Pastel1")  
  theme(legend.position = "none")

enter image description here

Or just a waffle where the colours represent values:

library(tidyverse)

testdat %>%
  mutate(x = rep(1:10, each = 10)[seq(nrow(.))],
         y = rep(1:10, 10)[seq(nrow(.))]) %>%
  ggplot(aes(x, y, fill = log(cts)))  
  geom_tile(width = 0.8, height = 0.8)  
  geom_text(aes(label = names), color = "white")  
  scale_fill_viridis_c(option = "E")  
  coord_equal()  
  theme_void()

enter image description here

CodePudding user response:

Simple approach:

Use a barchart with log transformed axis:

library(tidyverse)

testdat %>% 
  mutate(names = factor(names, levels = names)) %>% 
  ggplot(aes(x = names, y=log(cts), fill= log(cts))) 
  geom_col() 
  theme_bw()

enter image description here

  • Related