Home > Net >  Making multi-group line plot with many observations more readable
Making multi-group line plot with many observations more readable

Time:03-08

I have created the following plot:

Ugly plot

From a bigger version (5 rows, 58 columns) of this df:

df <- data.frame(row.names = c("ROBERT", "FRANK", "MICHELLE", "KATE"), `1` = c(31, 87, 22, 12), `2` = c(37, 74, 33, 20), `3` = c(35, 32, 44, 14))
colnames(df) <- c("1", "2", "3")

In the following manner:

df = df %>% 
  rownames_to_column("Name") %>% 
  as.data.frame()

df <- melt(df ,  id.vars = 'Name', variable.name = 'ep')
ggplot(df, aes(ep,value))   geom_line(aes(colour = Name, group=Name))

The plot kind of shows what I'd like to, but it really is a mess. Does anyone have a suggestion that would help me increasing its readability? Any help is very much appreciated!

CodePudding user response:

The best approach will really depend on what you're trying to understand or communicate with your plot. But here are a few options for visualizing lots of datapoints across a smallish number of cases. These are illustrated with a subset of the txhousing data included with ggplot2.

Solution 1: Faceting

As @rdelrossi suggested, one solution is to facet by Name:

library(ggplot2)

ggplot(df, aes(ep,value))   
  geom_line(aes(colour = Name, group=Name), show.legend = FALSE)  
  scale_x_continuous(expand = c(0,0))  
  facet_wrap(vars(Name), ncol = 1, scales = "free_x")  
  theme_bw()

Solution 2: Smoothing

Another option is to use geom_smooth() to smooth out local fluctuations to see larger longer-term trends:


ggplot(df, aes(ep,value))   
  geom_smooth(
    aes(colour = Name, group=Name), 
    se = FALSE,
    span = 1,    # higher number = smoother
    size = 1.25
  )  
  scale_x_date(expand = c(0,0))  
  theme_bw()

Solution 3: Lasagna

Sometimes called a "lasagna plot," this is a heatmap with cases on the y axis, time (or whatever) on the x axis, and values mapped to color. It's a different way of comparing changes within (left to right) and between (up and down) individuals.

ggplot(df, aes(ep, Name, colour = value, fill = value))   
  geom_tile(size = .5)  
  scale_fill_viridis_c(option = "B", aesthetics = c("colour", "fill"))  
  coord_cartesian(expand = FALSE)  
  theme(
    axis.text.y = element_text(size = 12, face = "bold"),
    axis.title.y = element_blank()
  )

Data prep:

library(dplyr)
library(lubridate)

df <- txhousing %>% 
  filter(
    city %in% c("Beaumont", "Amarillo", "Arlington", "Corpus Christi", "El Paso"),
    between(year, 2004, 2012)
  ) %>% 
  group_by(city) %>% 
  mutate(
    Name = city, 
    value = scale(sales),
    ep = ym(str_c(year, month))
  ) %>% 
  ungroup()

CodePudding user response:

If your readability concern is just the x axis labels, then I think the main issue is that when you use reshape2::melt() the result is that the column ep is a factor which means that the x axis of your plot will show all the levels and get crowded. The solution is to convert it to numeric and then it will adjust the labels in a sensible way.

I replace your use of reshape2::melt() with tidyr::pivot_longer() which has superseded it within the {tidyverse} but your original code would still work.

library(tidyverse)

df <- structure(list(`1` = c(31, 87, 22, 12), `2` = c(37, 74, 33, 20), `3` = c(35, 32, 44, 14)), class = "data.frame", row.names = c("ROBERT", "FRANK", "MICHELLE", "KATE"))

df %>% 
  rownames_to_column("Name") %>% 
  pivot_longer(-Name, names_to = "ep") %>%
  mutate(ep = as.numeric(ep)) %>% 
  ggplot(aes(ep, value, color = Name))  
  geom_line()

Created on 2022-03-07 by the enter image description here

Also you can add facet_grid(~Name)

enter image description here

Also you can add

geom_text(aes(label=value), position = position_stack(vjust = .5))

enter image description here

  • Related