Home > Software design >  How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2
How to assign colors to multicolor scatter plot with multicolor fitted lines in ggplot2

Time:09-22

Problem

I have some data points stored in data.frame with three variables, x, y, and gender. My goal is to draw several generally fitted lines and also lines specifically fitted for male/female over the scatter plot, with points coloured by gender. It sounds easy but some issues just persist.

What I currently do is to use a new set of x's and predict y's for every model, combine the fitted lines together in a data.frame, and then convert wide to long, with their model name as the third var (from this post: Points cannot be coloured

Code for another method (omitted two lines), which generates same-colour legend and multi-color plot:

ggplot(data = prdf, aes(x = x, y = fit))    # prdf and prdm are just data frames containing the x's and fitted values for different models
       geom_line(aes(lty="Female"),colour = "chocolate")  
       geom_line(data = prdm, aes(x = x, y = fit, lty="Male"), colour = "darkblue")   
       geom_point(data = data, aes(x = x, y = y, colour = gender))  
       scale_colour_discrete(name="Gender", breaks=c(0,1), 
                             labels=c("Male","Female"))

enter image description here

CodePudding user response:

This is related to using the colour aesthetic for lines and the fill aesthetics for points in your own (first) example. In the second example, it works because the colour aesthetic is used for lines and points.

By default, geom_point can not map a variable to fill, because the default point shape (19) doesn't have a fill.

For fill to work on points, you have to specify shape = 21:25 in geom_point(), outside of aes().

Perhaps this small reproducible example helps to illustrate the point:

Simulate data

set.seed(4821)
x1 <- rnorm(100, mean = 5)

set.seed(4821)
x2 <- rnorm(100, mean = 6)

data <- data.frame(x = rep(seq(20,80,length.out = 100),2),
                   tc = c(x1, x2),
                   gender = factor(c(rep("Female", 100), rep("Male", 100))))

Fit models

slrmen <-lm(tc~x I(x^2), data = data[data["gender"]=="Male",])
slrwomen <-lm(tc~x I(x^2),data = data[data["gender"]=="Female",])

newdat <- data.frame(x = seq(20,80,length.out = 200))

fitted.male <- data.frame(x = newdat,
                          gender = "Male",
                          tc = predict(object = slrmen, newdata = newdat))
fitted.female <- data.frame(x = newdat,
                           gender = "Female",
                           tc = predict(object = slrwomen, newdata = newdat))

Plot using colour aesthetics

Use the colour aesthetics for both points and lines (specify in ggplot such that it gets inherited throughout). By default, geom_point can map a variable to colour.

library(ggplot2)

ggplot(data, aes(x = x, y = tc, colour = gender))  
  geom_point()  
  geom_line(data = fitted.male)  
  geom_line(data = fitted.female)  
  scale_colour_manual(values = c("tomato","blue"))  
  theme_bw()

Plot using colour and fill aesthetics

Use the fill aesthetics for points and the colour aesthetics for lines (specify aesthetics in geom_* to prevent them being inherited). This will reproduce the problem.

ggplot(data, aes(x = x, y = tc))  
  geom_point(aes(fill = gender))  
  geom_line(data = fitted.male, aes(colour = gender))  
  geom_line(data = fitted.female, aes(colour = gender))  
  scale_colour_manual(values = c("tomato","blue"))  
  scale_fill_manual(values = c("tomato","blue"))  
  theme_bw()

To fix this, change the shape argument in geom_point to a point shape that can be filled (21:25).

ggplot(data, aes(x = x, y = tc))  
  geom_point(aes(fill = gender), shape = 21)  
  geom_line(data = fitted.male, aes(colour = gender))  
  geom_line(data = fitted.female, aes(colour = gender))  
  scale_colour_manual(values = c("tomato","blue"))  
  scale_fill_manual(values = c("tomato","blue"))  
  theme_bw()

Created on 2021-09-19 by the enter image description here

  • Related