Home > Net >  How can I modify this scatterplot to include a hierarchy based on a 3rd column of data?
How can I modify this scatterplot to include a hierarchy based on a 3rd column of data?

Time:02-21

I want to make a scatterplot of PM2.5 against life expectancy, within this i want 5 subcategories based on the GDP data (5 different colour plots and lines based on High to low GDP). How would i modify my current code to do this (or similar)? Code and data below, any help much appreciated.

plot = ggplot(dat6, aes(x=log(PM2.5), y= log(Lifeex)))  
  geom_point(colour = 'blue')  
  stat_smooth(method = "lm", col = "red")   
  xlab("Life Expectancy")  
  ylab("Concentration of PM2.5")  
  ggtitle("Relationship between Life expectancy and PM2.5")



dat6
                 Country Life_Expectancy         GDP     PM2.5
1                Afghanistan        60.38333   1788.3152 53.933333
2                    Albania        77.03333  10642.3801 20.408333
3                    Algeria        75.16667  13674.2199 31.521667
4                     Angola        51.96667   6770.9149 37.346667
5        Antigua and Barbuda        75.98333  20893.5925 20.415000
6                  Argentina        75.93333  19838.7166 11.893333
7                    Armenia        74.26667   7728.3425 33.143333
8                  Australia        82.36667  43862.4894  7.338333
9                    Austria        84.00000  46586.1927 14.303333
10                Azerbaijan        72.00000  16804.9607 20.308333

CodePudding user response:

Here is an example of what the question asks for.

cut is used to create a new column GDP_Level based on a break points vector brks. The levels are assigned names, ranging from "Very Low" to "Very High".

As for the plot I have removed the log transformations from the coordinates code and included then as transformations in both scale_*continuous instead.

dat6 <- read.table(text = "
                 Country Life_Expectancy         GDP     PM2.5
1                Afghanistan        60.38333   1788.3152 53.933333
2                    Albania        77.03333  10642.3801 20.408333
3                    Algeria        75.16667  13674.2199 31.521667
4                     Angola        51.96667   6770.9149 37.346667
5        'Antigua and Barbuda'        75.98333  20893.5925 20.415000
6                  Argentina        75.93333  19838.7166 11.893333
7                    Armenia        74.26667   7728.3425 33.143333
8                  Australia        82.36667  43862.4894  7.338333
9                    Austria        84.00000  46586.1927 14.303333
10                Azerbaijan        72.00000  16804.9607 20.308333
", header = TRUE)

library(ggplot2)

brks <- c(0, 5000, 10000, 20000, 40000, Inf)
dat6$GDP_Level <- cut(dat6$GDP, breaks = brks, labels = c("Very Low", "Low", "Medium", "High", "Very High"))

ggplot(dat6, aes(x = PM2.5, y = Life_Expectancy, color = GDP_Level))  
  geom_point(colour = 'blue')  
  stat_smooth(formula = y ~ x, method = "lm", col = "red")   
  xlab("Life Expectancy")  
  ylab("Concentration of PM2.5")  
  scale_x_continuous(trans = "log")  
  scale_y_continuous(trans = "log")  
  ggtitle("Relationship between Life expectancy and PM2.5")

Created on 2022-02-21 by the reprex package (v2.0.1)

  • Related