I am trying to set the size of geom_point
according to a factor. I know it is not advised, but my data is extremely unbalanced (the minimum value is 6 while the maximum is larger than 10,000).
I am trying to make the size of the points reflect the total sample sizes of studies. I divided total sample sizes into 6 levels: less than 100; 100 to 500; 500 to 1,000; 1,000 to 5,000; 5,000 to 10,000; and more than 10,000.
Here is my attempt:
rct_findings <- findings %>%
mutate(
Sample_Size_Range = case_when(
0 < Outcome_Sample_Size & Outcome_Sample_Size <= 100 ~ "0 < n <= 100",
100 < Outcome_Sample_Size & Outcome_Sample_Size <= 500 ~ "100 < n <= 500",
500 < Outcome_Sample_Size & Outcome_Sample_Size <= 1000 ~ "500 < n <= 1,000",
1000 < Outcome_Sample_Size & Outcome_Sample_Size <= 5000 ~ "1,000 < n <= 5,000",
5000 < Outcome_Sample_Size & Outcome_Sample_Size <= 10000 ~ "5,000 < n <= 10,000",
10000 < Outcome_Sample_Size ~ "10,000 < n"),
Sample_Size_Range = fct_relevel(Sample_Size_Range, c("0 < n <= 100", "100 < n <= 500", "500 < n <= 1,000", "1,000 < n <= 5,000", "5,000 < n <= 10,000", "10,000 < n")))
ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat, size = as_factor(Sample_Size_Range)))
geom_point()
The error message I got is:
Error in grid.Call.graphics(C_setviewport, vp, TRUE) : non-finite location and/or size for viewport In addition: Warning messages: 1: Using size for a discrete variable is not advised. 2: Removed 16 rows containing missing values (geom_point).
Anyone has any suggestion about how to fix this?
CodePudding user response:
This seems like a good usecase for the binned scale for size, with which you can circumvent setting the variable as a factor altogether.
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.1
# Dummy data
rct_findings <- data.frame(
Effect_Size_Study = rnorm(100),
F_test_var_stat = runif(100),
Outcome_Sample_Size = runif(100, min = 6, max = 10000)
)
ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat))
geom_point(aes(size = Outcome_Sample_Size))
scale_size_binned_area(
limits = c(0, 10000),
breaks = c(0, 100, 500, 1000, 5000, 10000),
)
Created on 2021-12-14 by the reprex package (v2.0.1)