I'm creating a sankey chart in R with networkD3::sankeyNetwork()
with the below sample data and script. I want to show percentage besides the node label.
the sankey with full dataset i create has 8 layers. i just post piece of data in below code.
library("networkD3")
library("htmlwidgets")
library("dplyr")
a <- read.csv(header = TRUE, text = "
date,dataCenter,customer,companyID,source,target,value
2021/11/3,dc1,customer1,companyID1,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/11/3,dc1,customer1,companyID1,view_manage_opportunities_2,scroll_down_3,4
2021/11/3,dc1,customer1,companyID1,scroll_down_3,show_more_4,6
2021/12/6,dc1,customer1,companyID1,view_page_capabilityportfolio_1,view_card_detail_assignment_2,1
2021/12/7,dc2,customer2,companyID2,view_page_modulepicker_1,view_manage_opportunities_2,1
2021/12/8,dc3,customer3,companyID3,view_page_actionsearch_1,view_manage_opportunities_2,1
2021/12/9,dc2,customer1,companyID1,view_page_modulepicker_1,view_card_detail_assignment_2,1
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_card_detail_role_2,
2021/12/9,dc1,customer1,companyID1,view_page_modulepicker_1,view_detail_assignment_2,
2021/12/2,dc1,customer1,companyID1,view_page_homepage_1,view_card_detail_role_2,1
2021/12/2,dc1,customer1,companyID1,view_page_actionsource_1,view_detail_role_2,1
2021/12/2,dc2,customer2,companyID2,view_page_actionsource_1,bounce,22
")
node_names <- unique(c(as.character(a$source), as.character(a$target)))
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(a$source, node_names) - 1,
target = match(a$target, node_names) - 1,
value = a$value)
# group by source and calculate the percentage of each node
g <- a %>%
group_by(source) %>%
summarize(cnt = n()) %>%
mutate(freq = round(cnt / sum(cnt) * 100, 2)) %>%
arrange(desc(freq))
nodes$name <- sub('(.*)_\\d ', '\\1', nodes$name)
links$linkgroup <- "linkgrp"
colourScale <-
'd3.scaleOrdinal()
.domain(["linkgrp"])
.range(["gainsboro"].concat(d3.schemeCategory20))'
p <- sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
Target = "target", Value = "value", NodeID = "name",
fontSize = 9,
fontFamily = "sans-serif", nodePadding=10,
margin = list(t=100),
sinksRight = FALSE, iterations = 0,
LinkGroup = "linkgroup",
colourScale = colourScale)
showLabel_string <-
'function(el, x){
d3.select(el).selectAll(".node text")
.text(d => d.name " (" d.value ")");}'
addTitle_string <-
'function(el) {
var cols_x = this.sankey.nodes().map(d => d.x 15).filter((v, i, a) => a.indexOf(v) === i).sort(function(a, b){return a - b});
cols_x.forEach((d, i) => {
d3.select(el)
.select("svg")
.append("text")
.attr("x", d)
.attr("y", 0).text("step" (i 1))
.style("font-size", "12px")
.style("font-family", "sans-serif")
.style("text-orientation", "upright");})
}'
p <- htmlwidgets::onRender(x = p, jsCode = showLabel_string)
p <- htmlwidgets::onRender(x = p, jsCode = addTitle_string)
p <- htmlwidgets::prependContent(p, htmltools::tags$h3("Opportunity Marketing User Behavior Monitor"))
p
Now I want to show the percentage besides each node label and count. I have already calculate the percentage value by below scirpt, but how to put it after node label and count?
I realize that below way to calculate the percentage for each node is not correct because when grouping by 'source' column, the nodes in last layer are missed as they are working as 'target' nodes only. I update the expected result with a new picture in the post which is more clear for how the percentage shown. In general, the percentage should follow the conservation of energy. Is it possible to achieved?
g <- a %>%
group_by(source) %>%
summarize(cnt = n()) %>%
mutate(freq = round(cnt / sum(cnt) * 100, 2)) %>%
arrange(desc(freq))
CodePudding user response:
You can add variables to the nodes
data.frame after the htmlwidget is created (otherwise, sankeyNetwork()
will only keep the required columns). Then you can edit the custom code for the text of the node labels to include the percentage...
p$x$nodes <- g %>%
mutate(name = sub("_[0-9]", "", source)) %>%
select(name, freq) %>%
right_join(p$x$nodes, by = "name") %>%
mutate(freq = ifelse(is.na(freq), "", paste0(freq, "%")))
showLabel_string <-
'function(el, x){
d3.select(el).selectAll(".node text")
.text(d => d.name " (" d.value ") " d.freq);}'