Home > Software engineering >  Is there a better representation which shows multiple lines in ggplot
Is there a better representation which shows multiple lines in ggplot

Time:01-20

I have plot 100 lines. Each line has color based on score. It is very difficult to see any pattern in the figure because lines overlap.

Is there a better representation which shows lines and scores are linked to one another.

I believe some kind of density plot can show the pattern.

enter image description here

library(tidyverse)

x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(1000) * score * 0.01


df <- tibble(x = x, 
             score = score, 
             y = y)


ggplot(data = df,
       aes(x = x, 
           y = y, 
           group = score,
           color = score))   
  geom_line(size = 0.15)  
  theme_bw()   
  theme(aspect.ratio = 0.5)   
  # legend.position="none")   
  scale_color_gradient(low = 'blue',  high = 'yellow')

CodePudding user response:

The sample data is simply too messy and complex to show in an unfiltered line plot. One option is to show a summary of each line via geom_smooth. Although you lose details in the data, it allows you to convey the message that you want the plot to show.

library(tidyverse)

x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(32100) * score * 0.01


df <- tibble(x = x, 
             score = score, 
             y = y)


ggplot(data = df,
       aes(x = x, 
           y = y, 
           group = score,
           color = score))   
  geom_smooth(linewidth = 0.5, se = FALSE)  
  theme_bw()   
  theme(aspect.ratio = 0.5)   
  scale_color_gradient(low = 'blue',  high = 'yellow')

enter image description here

CodePudding user response:

What about a heat map - which you could make by categorizing both x and y and then taking the average score in each x-y combination.

library(tidyverse)

x <- rep(seq(0, 3.2, 0.01), times = 100)
score <- rep(1:100, each = 321)
y = runif(32100) * score * 0.01


df <- tibble(x = x, 
             score = score, 
             y = y) %>% 
  mutate(x_cat = cut(x, breaks=11), 
         y_cat = cut(y, breaks=11)) %>% 
  group_by(x_cat, y_cat) %>% 
  summarise(score = mean(score), 
            x = median(range(x)), 
            y=median(range(y)))
#> `summarise()` has grouped output by 'x_cat'. You can override using the
#> `.groups` argument.


ggplot(df, aes(x=x_cat, y=y_cat, fill=score))   
  geom_tile()   
  scale_fill_gradient(low = 'blue',  high = 'yellow')   
  scale_x_discrete(labels=sprintf("%.2f", sort(unique(df$x))))   
  scale_y_discrete(labels=sprintf("%.6f", sort(unique(df$y))))   
  theme_classic()   
  theme(axis.text.x = element_text(angle=45, hjust=1))   
  labs(x="X", y="Y", fill="Average\nScore")

Created on 2023-01-19 by the reprex package (v2.0.1)

  • Related