Home > Enterprise >  Making a Reingold-Tilford tree in R
Making a Reingold-Tilford tree in R

Time:08-30

I have a dataset that contains information on a group of companies. Among the information is a classification (referred to as SBI-code) of the sector in which the company is reportedly active.

The SBI classifcation is hierachical, with main and sub-groups. So SBI 01 is agriculture, SBI 01.11 is agriculture - growing wheats, 01.13 is agriculture - growing vegetables, etc.

What I want to make is a (Reingold-Tilford?) tree in R, of the SBI-classifications that are present in my dataset. Similar to this one, but then vertical:

enter image description here

Below are the relevant columns of my dataset (E1... is the ID for the indivual companies), the full SIB-classifacation can be downloaded here (csv). I've tried looking for similar examples, but most of them are way more complex than what I try to do. Can this be done in R with the data that I have?

The reason I want to do it in R is because while the dataset is 52 companies now, I may grow to 500 over the next year and I want to be able to update it frequently.

structure(list(E1.Documentnummer = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", 
"28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", 
"39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49", 
"50", "51", "52"), CI.SBI.Hoofdact = c("4941", "4941", "46901", 
"4622", "46383", "52291", "3230", "52291", "4941", "52291", "4612", 
"1013", "52291", "1013", "4622", "50201", "46472", "01411", "46311", 
"01499", "4622", "1086", "1089", "1013", "4618", "2815", "4941", 
"1071", "01303", "6612", "46499", "1013", "46311", "46473", "1013", 
"1013", "4632", "46311", "46311", "6312", "4632", "4622", "4622", 
"4661", "0162", "1011", "4632", "4632", "46762", "1811", "01305", 
"10822")), row.names = c(NA, -52L), class = c("tbl_df", "tbl", 
"data.frame"))

CodePudding user response:

It seemed to me that you could use ggbump::geom_sigmoid() to get these nicely curved lines (no graph theory required here). Here is an example of how you could use it to plot it.

library(dplyr)
library(ggplot2)
library(ggbump)

# df <- structure(...) # omitted for brevity

df2 <- df %>%
  # Wrangle to count number of groups
  mutate(main = substr(CI.SBI.Hoofdact, 1, 2)) %>%
  group_by(main, CI.SBI.Hoofdact) %>%
  summarise(n = n()) %>%
  ungroup() %>%
  # Determine graphical positions
  mutate(
    label = paste0(CI.SBI.Hoofdact, " (n = ", n, ")"),
    pos = match(CI.SBI.Hoofdact, sort(unique(CI.SBI.Hoofdact)))
  ) %>%
  group_by(main) %>%
  mutate(mid_pos = mean(pos))
#> `summarise()` has grouped output by 'main'. You can override using the `.groups`
#> argument.

ggplot(df2, aes(x = 0, y = pos))  
  # Connections between points
  geom_sigmoid(
    aes(xend = 1, y = mid_pos, yend = pos,
        group = CI.SBI.Hoofdact, size = n)
  )  
  # Left labels
  geom_text(
    data = ~ .x[!duplicated(.x$main), ],
    aes(y = mid_pos, label = main),
    hjust = 1
  )  
  # Right labels
  geom_text(aes(x = 1, label = label),
            hjust = 0)  
  # Controls thickness of lines
  scale_size_continuous(
    range = c(0, 3), limits = c(0, NA)
  )  
  # Decorative stuff
  scale_x_continuous(
    expand = c(0, 0.5),
    breaks = c(0, 1),
    labels = c("Main Group", "Sub Group")
  )  
  theme_void()  
  theme(axis.text.x.bottom = element_text())

Created on 2022-08-26 by the reprex package (v2.0.0)

  • Related