Home > OS >  Scatter plot with ggplot, using indexing to plot subsets of the same variable on x and y axis
Scatter plot with ggplot, using indexing to plot subsets of the same variable on x and y axis

Time:12-01

I'm working with a subset of weather data for Heathrow downloaded Met Office data. This data set contains no missing values.

Using ggplot, I'd like to create a scatter plot for the maximum temperature (tmax) for Heathrow, with 2018 data plotted against 2019 data (see below for example). There are 12 data points for both 2018 and 2019.

I've attempted this with the below, however it does not work. This appears to be due to the indexing as the code works fine when not attempting to use the indexes within the aes() function.

How can I get this to work?

2018Index <- which(HeathrowData$Year == 2018) 
2019Index <- which(HeathrowData$Year == 2019) 

scatter<-ggplot(HeathrowData, aes(tmax[2018Index], tmax[2019Index]))
scatter   geom_point()
scatter   geom_point(size = 2)   labs(x = "2018", y = "2019"))

enter image description here

CodePudding user response:

As your data is in long format you need some data wrangling to put the values for your years in separate columns aka you have to reshape your data to wide:

Using some random fake data:

library(dplyr)
library(tidyr)
library(ggplot2)

# Example data
set.seed(123)

HeathrowData <- data.frame(
  Year = rep(2017:2019, each = 12),
  tmax = runif(36)
)

# Select, Filter, Convert to Wide
HeathrowData <- HeathrowData %>% 
  select(Year, tmax) %>% 
  filter(Year %in% c(2018, 2019)) %>% 
  group_by(Year) %>% 
  mutate(id = row_number()) %>% 
  ungroup() %>% 
  pivot_wider(names_from = Year, values_from = tmax, names_prefix = "y")

ggplot(HeathrowData, aes(y2018, y2019))  
  geom_point(size = 2)  
  labs(x = "2018", y = "2019")

  • Related