Home > Software design >  How to deal with long legend labels in R ggplot
How to deal with long legend labels in R ggplot

Time:12-13

I am working on a sectoral comparison of a stock index and I want to show correlation of employee numbers and profit, sales etc. However when I try to make a scatter plot with colors differing by sectors and add a legend, because that sector names are so long, plot becomes unreadable. I can make it work around with faceting but I want to show all sectoral data in one plot to make it easy to compare.

Here is the toy data frame

structure(list(Dates = c("2021-12-31", "2021-12-31", "2021-12-31", 
"2021-12-31", "2021-12-31", "2021-12-31", "2021-12-31", "2021-12-31", 
"2021-12-31", "2021-12-31", "2021-12-31", "2021-12-31", "2021-12-31", 
"2021-12-31", "2021-12-31"), Firm = c("BRYAT", "ZRGYO", "KONTR", 
"TRGYO", "DAPGM", "INVES", "ECZYT", "GWIND", "KMPUR", "SNGYO", 
"BERA", "ISGYO", "ALKA", "YGGYO", "SRVGY"), Sector = c("FINANCIAL AND INSURANCE ACTIVITIES", 
"FINANCIAL AND INSURANCE ACTIVITIES", "INFORMATION AND COMMUNICATION", 
"REAL ESTATE ACTIVITIES", "REAL ESTATE ACTIVITIES", "FINANCIAL AND INSURANCE ACTIVITIES", 
"FINANCIAL AND INSURANCE ACTIVITIES", "MANUFACTURING", "MANUFACTURING", 
"REAL ESTATE ACTIVITIES", "MANUFACTURING", "REAL ESTATE ACTIVITIES", 
"MANUFACTURING", "FINANCIAL AND INSURANCE ACTIVITIES", "FINANCIAL AND INSURANCE ACTIVITIES"
), `Number of Employees` = c(9, 32, 226, 144, 138, 6, 3, 50, 
219, 194, 33, 69, 208, 138, 11), EBITDA = c(23.8137, 1113.5212, 
113.9684, 6517.373, 387.6152, 1624.0192, -7.8884, 420.6019, 280.8024, 
4648.4098, 752.1899, 153.6431, 138.7504, 194.0958, 144.2635), 
    `Profit Margin` = c(574.7712, 706.7635, 21.3576, 357.6667, 
    34.2274, 537.1883, 372.8143, 31.6373, 14.5637, 150.702, 17.7025, 
    48.8762, 26.2978, 252.0494, 162.3127), Sales = c(47.7833, 
    186.8011, 611.8077, 1483.729, 912.4001, 312.5418, 0, 540.2263, 
    2172.6885, 2123.4395, 4217.1842, 545.2598, 700.9869, 277.6963, 
    414.2354), `Market Cap` = c(10681.875, 11405.4966, 2474.0625, 
    3920, NA, NA, 5407.5, 3123.1821, NA, 3728.536, 3320.352, 
    2435.225, 859.425, 3386.88, 4695.6), `Personnel Expense (Millions)` = c(3.4716, 
    8.4842, 7.7846, 11.869, 14.3014, 1.1789, 0.4507, 17.2696, 
    55.9705, 25.5093, 308.7577, 20.0843, 33.358, 10.5289, 14.3268
    ), `Personnel Expense Per Employee` = c(385733.5556, 265130.0938, 
    34445.1726, 82423.6111, 103633.6739, 196478.3333, 150236.6667, 
    345392.94, 255573.0411, 131491.1082, 9356294.7879, 291076.971, 
    160374.9663, 76296.7319, 1302436.6364), `Price/Earnings` = c(38.8935, 
    7.8922, 18.934, 0.7387, NA, NA, 12.5925, 18.2735, NA, 1.1651, 
    4.4476, 9.1377, 4.6621, 4.6933, 6.9838)), row.names = c(NA, 
-15L), class = c("tbl_df", "tbl", "data.frame"))

I tried to make legend text font size smaller but this time the legend becomes unreadable. I am pretty sure that I can wrap the text but I really couldn't find the code.

ggplot(dataset%>%
               filter(Dates %in% "2021-12-31" & `Number of Employees` < 250)) 
        geom_point(aes(x = `Number of Employees`, y  = `Profit Margin`, color = Sector),show.legend = T) 
    lims(y = c(-100,200)) 
    theme(legend.text = element_text(size = 6,hjust = 0.5,lineheight = 121))

CodePudding user response:

I suggest you move the legend to the bottom (or top) and set the number of rows:

dataset %>%
  filter(Dates %in% "2021-12-31", `Number of Employees` < 250) %>%
  ggplot(aes(x = `Number of Employees`, y = `Profit Margin`))  
  geom_point(aes(color = Sector))  
  lims(y = c(-100,200))  
  theme(
    legend.text = element_text(size = 6,hjust = 0.5, lineheight = 121), 
    legend.position = "bottom")  
  guides(color = guide_legend(nrow = 2))

ggplot2 with legend on the bottom, 2 rows

CodePudding user response:

I looked at Hadley Wickham's GitHub and found legend.key.size() and legend.key.width() options.

dataset %>%
  filter(Dates %in% "2021-12-31", `Number of Employees` < 250) %>%
  ggplot(aes(x = `Number of Employees`, y = `Profit Margin`))  
  geom_point(aes(color = Sector))  
  lims(y = c(-50,200))  
  theme(
    legend.text = element_text(size = 5,vjust = 0.1), 
    legend.position = "bottom",
    legend.key.width = unit(0.05, "cm"),
    legend.key.size = unit(0.1, "cm")) 
  guides(color = guide_legend(ncol  = 3, nrow = 5))

legend here

CodePudding user response:

The scales package provides label_wrap() to wrap long strings at natural breakpoints after a specified number of characters. In combination with r2evans' suggestions about legend positioning and formatting, you should be able to achieve readable text for many labels.

Plot with example data


dataset %>%
  filter(Dates %in% "2021-12-31" & `Number of Employees` < 250) %>%
  ggplot(aes(x = `Number of Employees`, y = `Profit Margin`, color = Sector))  
  geom_point(size = 3)  
  scale_colour_viridis_d(
    option = "turbo", end = 0.8,
    labels = scales::label_wrap(20),
    guide = guide_legend(nrow = 2)
  )  
  theme_bw()  
  theme(
    legend.text = element_text(size = 7, hjust = 0.5), 
    legend.key.height = unit(10, "mm"), # increase this to space out the wrapped labels
    legend.position = "bottom" # might be more space down here, depends on plot dimensions
  )

14 distinct colours will be tricky, my suggestion for that is to use/abuse the turbo palette, as above, or consider using both shapes and colour to distinguish your categories.

  • Related