Home > Net >  How to visualise the relationship between two variables based on another one in R?
How to visualise the relationship between two variables based on another one in R?

Time:11-19

I have a table (based on a .csv file named "bikes"), which contains around 10,000 rows and looks like this:

rentals  season  weatherCondition
  12        1         4
  6         4         1
  21        1         4
  4         3         1
  5         3         2
  19        1         4
  13        1         3
  10        2         4
  8         2         3
  ..        ..        ..

My task is to visualise the relationship between "RENTALS" and "SEASON" based on "weatherCondition".

So far I managed to do the following:

library(tidyverse)
ggplot(data=bikes, mapping = aes(x = season, y = rentals))   
  geom_point(aes(color = weatherCond))  
  theme_bw()

The result is:

enter image description here

I assume that I'm heading in the right direction, but it's far from perfect.

  1. It doesn't visualise the data perfectly.
  2. I assume that providing more information regarding the correlation, the connection or the relationship between the variables would be beneficial.

How would you solve this problem?

CodePudding user response:

Seems like you probably want grouped box plots. You can add colours or fills to this if necessary

library(tidyverse)
bikes %>%
  ggplot(aes(x = factor(season), y = rentals, group = factor(weatherCond)))   
  geom_boxplot()  
  theme_bw()

CodePudding user response:

I would summarize the data by weather condition and season, given the weather conditions are not thousand different ones. Ideally both can be named, like "rain", "summer" etc. Either case, the following works.

res <- data.frame( aggregate( rentals ~ weatherCondition   season, dat, sum ))

# for numerical data
  weatherCondition season rentals
1                3      1      13
2                4      1      52
3                3      2       8
4                4      2      10
5                1      3       4
6                2      3       5
7                1      4       6

# for named conditions and seasons
  weatherCondition season rentals
1            rainy   fall       6
2            foggy spring       8
3            snowy spring      10
4            rainy summer       4
5            sunny summer       5
6            foggy winter      13
7            snowy winter      52

The plotting

barplot( res[,"rentals"], names=apply( res[,1:2], 1, function(x) paste0(x[1],"_",x[2]) ), las=3 )

barplot

Obviously, the labels don't make much sense here in my toy example.

  • Related