I am looking to understand the best way to plot bubble charts and it is surprisingly a bit more confusing than I expected.
Take a simple data frame:
library(tidyverse)
bubble_data <- data.frame(x = c(1, 1), y = c(1, 1), size = c(9, 16), type = c("A", "B"))
> bubble_data
x y size type
1 1 1 9 A
2 1 1 16 B
If we make the size of the plotted bubbles equal to the size
variable, then if done correctly, the areas of the plotted bubbles should be in the ratio of 9 to 16, or ~56%.
Plotting we get something that does not look anywhere close to a 56% ratio of the areas (and the bubbles are really small):
ggplot(bubble_data, aes(x = x, y = y, size = size, color = type))
geom_point(alpha = 0.3)
So we could create a new variable by scaling the underlying data by taking the square root of the original data (since area is proportional to the square of the radius), and then plot that:
bubble_data_scaled <- bubble_data %>%
mutate(scaled_size = sqrt(size))
> bubble_data_scaled
x y size type scaled_size
1 1 1 9 A 3
2 1 1 16 B 4
# plot with scaled size
ggplot(bubble_data_scaled, aes(x = x, y = y, size = scaled_size, color = type))
geom_point(alpha = 0.3)
Which yields a plot that looks pretty much the same as the previous plot
Finally, we would use the scale_size_area
or scaled size
on the original bubble_data
data set
ggplot(bubble_data, aes(x = x, y = y, size = size, color = type))
geom_point(alpha = 0.3)
scale_size_area()
But this does not seem to yield the correct result either, and one of the bubbles looks a bit offset.
Which brings me to my question: What is the correct way to plot a bubble chart where the areas of the bubbles would reflect the correct relative sizes of the underlying data, and how to scale the plot output so that the bubbles are easier to see?
Thanks!
CodePudding user response:
To get bubbles scaled by size, use scale_size_area
. To get them bigger change the max_size
parameter.
ggplot(bubble_data, aes(x = x, y = y, size = size, color = type))
geom_point(alpha = 0.3)
scale_size_area(max_size = 20, breaks = (0:5)^2)
As for the offset appearance, that sounds like it's due to the rendering device not using anti-aliasing. This can make lines appear jagged, or shapes to look asymmetric at the pixel level. The default plot window device in windows, for instance, doesn't use anti-aliasing. You can add the ragg
device and adjust the settings in RStudio to make that the default for the plot window. Or you can render using ggsave ragg
or cairo
to get nice anti-aliasing.
To get coordinate-scale control over the bubble size (ie to make a bubble "x units wide" in coordinate space, you could use ggforce::geom_circle
.