Home > Software engineering >  How to create a legend and change the point shape in ggplot2?
How to create a legend and change the point shape in ggplot2?

Time:03-31

I need to reproduce the following figure

Figure I need to reproduce

on R Studio for a macroeconomics project. I've been able to figure most of it out, but the stuff that's giving me the biggest issue is the legend and changing the shapes of the data points. Here is what I have so far

How far I've gotten

Here is my code with the data points input manually

# `First I load up some packages`

library(tidyverse)

# `then I create vectors for the years and percentages for Europe and the US`

year <- c(1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 200, 2010)
us_percent <- c(40.5, 40.9, 44.7, 45.1, 36.5, 33.7, 34.1, 33.4, 37.5, 42.4, 46.9, 48.8)
 euro_percent <- c(45.9, 45.8, 39.5, 40.5, 33.8, 31.7, 31.6, 29.7, 29.4, 32.4, 34, 34.7)

# `then I create a data frame for my vectors and name it`

df <- data.frame(year, us_percent, euro_percent)

# `then I create a ggplot function for the US and Europe inequality`

ggplot()  
  geom_line(data=df, mapping=aes(x=year, y=euro_percent), color="cyan")  
  geom_point(data=df, mapping=aes(x=year, y=euro_percent), color="cyan", )  
  geom_line(data=df, mapping=aes(x=year, y=us_percent), color="purple")  
  geom_point(data=df, mapping=aes(x=year, y=us_percent), color="purple")  
  xlim(1900, 2010)  
  ylim(25, 50)  
  labs(x="Year", y="Share of Total Pretax Income", title="Income Inequality")

I tried running various versions of scale_color_manual program, but it would leave it so my R console would show a " " sign, and I don't know what to add after that. Any suggestions would be welcome. Thank you!

CodePudding user response:

Pivot your data to long format and map the currency to shape and color aesthetics.

It's also a good idea to pick colors, sizes and themes that make your data and the differences between series clearly visible

ggplot(tidyr::pivot_longer(df, -1, names_to = "Currency"), 
       aes(year, value, color = Currency))  
  geom_line()  
  geom_point(aes(shape = Currency), size = 4)  
  scale_color_manual(values = c("orange2", "deepskyblue3"))  
  theme_light()  
  xlim(1900, 2010)  
  ylim(25, 50)  
  labs(x="Year", 
       y="Share of Total Pretax Income", 
       title="Income Inequality",
       color = "", shape = "")

enter image description here

CodePudding user response:

Instead of manually specifying the color (in your case, "cyan" and "purple") transform your data from 'wide' to 'long' and create a new variable that categorizes each observation into "US" or "EURO". This can be done with the gather() from dplyr (fixed your typos as well):

Year <- c(1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010)
us_percent <- c(40.5, 40.9, 44.7, 45.1, 36.5, 33.7, 34.1, 33.4, 37.5, 42.4, 46.9, 48.8)
euro_percent <- c(45.9, 45.8, 39.5, 40.5, 33.8, 31.7, 31.6, 29.7, 29.4, 32.4, 34, 34.7)

df <- data.frame(Year, us_percent, euro_percent)

df_long <- df %>% gather(key = "Country", value = "Percentage", us_percent:euro_percent, factor_key = TRUE)
# Where: key is the new indicator name, value is the stacked data column 

Then, when you're data is set and ready, you can simply define the functional aesthetics (shape, color) within the main ggplot() function, like so:

ggplot(
  data = df_long,
  aes(
    x = Year,
    y = Percentage, 
    color = Country, 
    shape = Country
  )
)   
  geom_point(
    
  )   
  geom_line(
    
  )   
  xlim(1900, 2010)  
  ylim(25, 50)  
  labs(
    x="Year", 
    y="Share of Total Pretax Income", 
    title="Income Inequality"
    )   
  theme_bw(
    
  )    
  scale_color_grey(
    
  )

Income Inequality

Now you've got less code and a bit more room to wiggle with some additional aesthetic changes, if you want.

CodePudding user response:

You can use the shape = argument. In general with ggplot2 it's advisable to have a dataset in "long" format before plotting - much easier to then add a legend, if you add the stratification variables in the aes() as shapes and colours.

library(tidyr)
library(ggplot2)

legend_labels <- c("Top 10% Income share: Europe", "Top 10% Income share: US")

df |> pivot_longer(cols = ends_with("percent"),
                   names_to = "area",
                   values_to = "percent") |> 
  ggplot(aes(x = year, y = percent, colour = area))  
  geom_point(aes(shape = area))  
  geom_line()  
  coord_cartesian(xlim = c(1900, 2010),
                  ylim = c(25, 50))  
  labs(x = "Year",
       y = "Share of Total Pretax Income",
       title = "Income Inequality",
       colour = "",
       shape = "")  
  scale_colour_discrete(labels = legend_labels)  
  scale_shape_manual(values = c(2, 15), labels = legend_labels)  
  theme_bw()  
  theme(legend.position = "bottom",
        panel.grid.minor.x = element_blank(),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.y = element_blank())

graph

  • Related