Home > database >  Groupwise percentage in ggparty node plots
Groupwise percentage in ggparty node plots

Time:09-24

I'm working with ggparty and trying to customize the bar charts at each terminal node of a partykit plot (with geom_node_plot). I want the lengths of some bars (and their labels) to show a proportion of the total predicted cases -- where the denominator in that percentage is from cases at that node only, not out of the entire data set.

(This is a common enough question within ggplot2 alone, but the difficulty here is specific to working inside of ggparty.)

require(ggplot2)
require(partykit)
require(ggparty)
    
data("WeatherPlay", package = "partykit")
wptree <- ctree(play ~ outlook, data=WeatherPlay, control = ctree_control(minsplit=1, minbucket=1, mincriterion = .01))
    
ggparty(wptree)  
      geom_edge()  
      geom_edge_label()  
      geom_node_splitvar()  
      geom_node_plot(gglist = list(
        aes(
          y = play,
          x = after_stat(count / sum(count))
        ),
        geom_bar()
      ))

The vignette for ggparty states that each node plot is "separate", suggesting that the data are segregated, but this doesn't seem to apply to count, for as you can see, the percentages are out of all data in the dataset.

I know that a slightly alternate method would be to use prop instead of count / sum(count) and this allows the "by" parameter to specify the grouping. That seems ideal -- but what is the grouping variable available here, that would segregate the nodes? The party nodeid does not seem to be in the environments for each geom_bar call.

Any custom groupwise calculation, e.g. with lapply, also would require a variable to use, specifying the terminal node, or otherwise distinguishing between plots -- so that seems to be the key issue.

Thanks for any guidance.

CodePudding user response:

As far as I get it the bars for the nodes are added as a facetted plot. To get the percentages by panel you could basically use the same approach as for a standard ggplot, i.e. as you already guessed you have to take account of the grouping which for the panels you could get from PANEL:

library(ggparty)

data("WeatherPlay", package = "partykit")
wptree <- ctree(play ~ outlook, data=WeatherPlay, control = ctree_control(minsplit=1, minbucket=1, mincriterion = .01))

panel_prop <- function(count, panel) {
  count / tapply(count, panel, sum)[as.character(panel)]
}

ggparty(wptree)  
  geom_edge()  
  geom_edge_label()  
  geom_node_splitvar()  
  geom_node_plot(gglist = list(
    aes(
      y = play,
      x = after_stat(panel_prop(count, PANEL))
    ),
    geom_bar()
  ))

  • Related