I have a dataset that contains information on a group of companies. Among the information is a classification (referred to as SBI-code) of the sector in which the company is reportedly active.
The SBI classifcation is hierachical, with main and sub-groups. So SBI 01 is agriculture, SBI 01.11 is agriculture - growing wheats, 01.13 is agriculture - growing vegetables, etc.
What I want to make is a (Reingold-Tilford?) tree in R, of the SBI-classifications that are present in my dataset. Similar to this one, but then vertical:
Below are the relevant columns of my dataset (E1... is the ID for the indivual companies), the full SIB-classifacation can be downloaded here (csv). I've tried looking for similar examples, but most of them are way more complex than what I try to do. Can this be done in R with the data that I have?
The reason I want to do it in R is because while the dataset is 52 companies now, I may grow to 500 over the next year and I want to be able to update it frequently.
structure(list(E1.Documentnummer = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27",
"28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38",
"39", "40", "41", "42", "43", "44", "45", "46", "47", "48", "49",
"50", "51", "52"), CI.SBI.Hoofdact = c("4941", "4941", "46901",
"4622", "46383", "52291", "3230", "52291", "4941", "52291", "4612",
"1013", "52291", "1013", "4622", "50201", "46472", "01411", "46311",
"01499", "4622", "1086", "1089", "1013", "4618", "2815", "4941",
"1071", "01303", "6612", "46499", "1013", "46311", "46473", "1013",
"1013", "4632", "46311", "46311", "6312", "4632", "4622", "4622",
"4661", "0162", "1011", "4632", "4632", "46762", "1811", "01305",
"10822")), row.names = c(NA, -52L), class = c("tbl_df", "tbl",
"data.frame"))
CodePudding user response:
It seemed to me that you could use ggbump::geom_sigmoid()
to get these nicely curved lines (no graph theory required here). Here is an example of how you could use it to plot it.
library(dplyr)
library(ggplot2)
library(ggbump)
# df <- structure(...) # omitted for brevity
df2 <- df %>%
# Wrangle to count number of groups
mutate(main = substr(CI.SBI.Hoofdact, 1, 2)) %>%
group_by(main, CI.SBI.Hoofdact) %>%
summarise(n = n()) %>%
ungroup() %>%
# Determine graphical positions
mutate(
label = paste0(CI.SBI.Hoofdact, " (n = ", n, ")"),
pos = match(CI.SBI.Hoofdact, sort(unique(CI.SBI.Hoofdact)))
) %>%
group_by(main) %>%
mutate(mid_pos = mean(pos))
#> `summarise()` has grouped output by 'main'. You can override using the `.groups`
#> argument.
ggplot(df2, aes(x = 0, y = pos))
# Connections between points
geom_sigmoid(
aes(xend = 1, y = mid_pos, yend = pos,
group = CI.SBI.Hoofdact, size = n)
)
# Left labels
geom_text(
data = ~ .x[!duplicated(.x$main), ],
aes(y = mid_pos, label = main),
hjust = 1
)
# Right labels
geom_text(aes(x = 1, label = label),
hjust = 0)
# Controls thickness of lines
scale_size_continuous(
range = c(0, 3), limits = c(0, NA)
)
# Decorative stuff
scale_x_continuous(
expand = c(0, 0.5),
breaks = c(0, 1),
labels = c("Main Group", "Sub Group")
)
theme_void()
theme(axis.text.x.bottom = element_text())
Created on 2022-08-26 by the reprex package (v2.0.0)