I have a table (based on a .csv file named "bikes"), which contains around 10,000 rows and looks like this:
rentals season weatherCondition
12 1 4
6 4 1
21 1 4
4 3 1
5 3 2
19 1 4
13 1 3
10 2 4
8 2 3
.. .. ..
My task is to visualise the relationship between "RENTALS" and "SEASON" based on "weatherCondition".
So far I managed to do the following:
library(tidyverse)
ggplot(data=bikes, mapping = aes(x = season, y = rentals))
geom_point(aes(color = weatherCond))
theme_bw()
The result is:
I assume that I'm heading in the right direction, but it's far from perfect.
- It doesn't visualise the data perfectly.
- I assume that providing more information regarding the correlation, the connection or the relationship between the variables would be beneficial.
How would you solve this problem?
CodePudding user response:
Seems like you probably want grouped box plots. You can add colours or fills to this if necessary
library(tidyverse)
bikes %>%
ggplot(aes(x = factor(season), y = rentals, group = factor(weatherCond)))
geom_boxplot()
theme_bw()
CodePudding user response:
I would summarize the data by weather condition and season, given the weather conditions are not thousand different ones. Ideally both can be named, like "rain", "summer" etc. Either case, the following works.
res <- data.frame( aggregate( rentals ~ weatherCondition season, dat, sum ))
# for numerical data
weatherCondition season rentals
1 3 1 13
2 4 1 52
3 3 2 8
4 4 2 10
5 1 3 4
6 2 3 5
7 1 4 6
# for named conditions and seasons
weatherCondition season rentals
1 rainy fall 6
2 foggy spring 8
3 snowy spring 10
4 rainy summer 4
5 sunny summer 5
6 foggy winter 13
7 snowy winter 52
The plotting
barplot( res[,"rentals"], names=apply( res[,1:2], 1, function(x) paste0(x[1],"_",x[2]) ), las=3 )
Obviously, the labels don't make much sense here in my toy example.