Home > Blockchain >  Line chart plotting nested categorical values for multiple groups (ggplot2)
Line chart plotting nested categorical values for multiple groups (ggplot2)

Time:10-01

I have data on how different hotel brands scored on a customer satisfaction survey. The survey questions are divided into categories and not all categories have the same number of questions. See the data below:

hotels = data.frame(category = rep(c("room","room","service","service","overall rating"),each = 3),
                   subcategory = rep(c("comfort","cleanliness","professionalism","promptness","overall rating"),each = 3),
                   brand = rep(c("hotel 1","hotel 2","hotel 3"),times = 5),
                   score = c(6,10,4,7,9,2,6,9,5,9,7,3,6,8,3))

I need to plot the data as a line chart visualizing each brand's score on each question/subcategory with the broader categories labeled as well. My first attempt looked like this:

# factor variables so they appear in correct order when plotted
hotels$category = factor(hotels$category, levels = c("room","service","overall rating"))
hotels$subcategory = factor(hotels$subcategory, levels =c("comfort","cleanliness","professionalism","promptness","overall rating"))

# plot
library(dplyr)
library(ggplot2)

p = hotels %>%
  ggplot(aes(x=subcategory, y=score, group=brand, color=brand))  
  geom_line()  
  geom_point()

p

Here is the plot. This would be fine if I didn't need subcategory and category labels, but I do. Next, I attempted faceting:

p = hotels %>%
  ggplot(aes(x=subcategory, y=score, group=brand, color=brand))  
  geom_line()  
  geom_point()  
  facet_grid(~category,
             scales = "free_x",
             space = "free_x",
             switch = "x")  
  theme(panel.spacing = unit(0, units = "cm"),
                             strip.placement = "outside")

p

This was the result. As you can see, the lines are broken up between the facets. How do I create a chart with all points connected like in the first example, but with the categories and subcategories labeled like in the second example? In case it isn't obvious, I'm newer to ggplot2, so I appreciate any solutions you may have.

P.S. These solutions address similar issues, but are not quite what I need:

EDIT: teunbrand's answer below was exactly what I needed. I adjusted the code slightly to put the categories in the correct order (factoring the pasted variable did not work). The final code looks like this:

# data
hotels = data.frame(category = rep(c("room","room","service","service","overall rating"),
                                   each = 3),
                    subcategory = rep(c("comfort","cleanliness","professionalism",
                                        "promptness","overall rating"),each = 3),
                    brand = rep(c("hotel 1","hotel 2","hotel 3"),times = 5),
                    score = c(6,10,4,7,9,2,6,9,5,9,7,3,6,8,3))

# add pasted variable directly to data set
hotels$paste = paste0(hotels$subcategory, "&", hotels$category)

# plot
library(dplyr)
library(ggplot2)
library(ggh4x)
library(forcats)
p = hotels %>%
# use mutate function from forcats to reorder categories
  mutate(paste = fct_relevel(paste, 
                            "comfort&room", "cleanliness&room", "professionalism&service", 
                            "promptness&service", "overall rating&overall rating")) %>%
# x is reordered pasted variable
  ggplot(aes(x=paste, 
             y=score, group=brand, color=brand))  
  geom_line()  
  geom_point()  
  guides(x = ggh4x::guide_axis_nested(delim = "&"))

p

And the final plot looks like this.

Thank you for your help!

CodePudding user response:

Here is one option with ggh4x::guide_axis_nested(). You can combine labels of super- and subcategories, which the guide will separate into different lines. Disclaimer: I'm the author of that function.

library(ggplot2)

hotels = data.frame(category = rep(c("room","room","service","service","overall rating"),each = 3),
                    subcategory = rep(c("comfort","cleanliness","professionalism","promptness","overall rating"),each = 3),
                    brand = rep(c("hotel 1","hotel 2","hotel 3"),times = 5),
                    score = c(6,10,4,7,9,2,6,9,5,9,7,3,6,8,3))

hotels$category = factor(hotels$category, levels = c("room","service","overall rating"))
hotels$subcategory = factor(hotels$subcategory, levels =c("comfort","cleanliness","professionalism","promptness","overall rating"))

# plot
library(dplyr)
library(ggplot2)

hotels %>%
  ggplot(aes(x=paste0(subcategory, "&", category), 
             y=score, group=brand, color=brand))  
  geom_line()  
  geom_point()  
  guides(x = ggh4x::guide_axis_nested(delim = "&"))

Created on 2021-09-29 by the reprex package (v2.0.1)

  • Related