I'm working with ggparty and trying to customize the bar charts at each terminal node of a partykit plot (with geom_node_plot). I want the lengths of some bars (and their labels) to show a proportion of the total predicted cases -- where the denominator in that percentage is from cases at that node only, not out of the entire data set.
(This is a common enough question within ggplot2 alone, but the difficulty here is specific to working inside of ggparty.)
require(ggplot2)
require(partykit)
require(ggparty)
data("WeatherPlay", package = "partykit")
wptree <- ctree(play ~ outlook, data=WeatherPlay, control = ctree_control(minsplit=1, minbucket=1, mincriterion = .01))
ggparty(wptree)
geom_edge()
geom_edge_label()
geom_node_splitvar()
geom_node_plot(gglist = list(
aes(
y = play,
x = after_stat(count / sum(count))
),
geom_bar()
))
The vignette for ggparty states that each node plot is "separate", suggesting that the data are segregated, but this doesn't seem to apply to count
, for as you can see, the percentages are out of all data in the dataset.
I know that a slightly alternate method would be to use prop
instead of count / sum(count)
and this allows the "by" parameter to specify the grouping. That seems ideal -- but what is the grouping variable available here, that would segregate the nodes? The party nodeid does not seem to be in the environments for each geom_bar call.
Any custom groupwise calculation, e.g. with lapply, also would require a variable to use, specifying the terminal node, or otherwise distinguishing between plots -- so that seems to be the key issue.
Thanks for any guidance.
CodePudding user response:
As far as I get it the bars for the nodes are added as a facetted plot. To get the percentages by panel you could basically use the same approach as for a standard ggplot, i.e. as you already guessed you have to take account of the grouping which for the panels you could get from PANEL
:
library(ggparty)
data("WeatherPlay", package = "partykit")
wptree <- ctree(play ~ outlook, data=WeatherPlay, control = ctree_control(minsplit=1, minbucket=1, mincriterion = .01))
panel_prop <- function(count, panel) {
count / tapply(count, panel, sum)[as.character(panel)]
}
ggparty(wptree)
geom_edge()
geom_edge_label()
geom_node_splitvar()
geom_node_plot(gglist = list(
aes(
y = play,
x = after_stat(panel_prop(count, PANEL))
),
geom_bar()
))