Home > Enterprise >  ggplot2 Set geom_point Size according to a Factor
ggplot2 Set geom_point Size according to a Factor

Time:12-15

I am trying to set the size of geom_point according to a factor. I know it is not advised, but my data is extremely unbalanced (the minimum value is 6 while the maximum is larger than 10,000).

I am trying to make the size of the points reflect the total sample sizes of studies. I divided total sample sizes into 6 levels: less than 100; 100 to 500; 500 to 1,000; 1,000 to 5,000; 5,000 to 10,000; and more than 10,000.

Here is my attempt:

rct_findings <- findings %>% 
  mutate(
   
    Sample_Size_Range = case_when(
      0 < Outcome_Sample_Size & Outcome_Sample_Size <= 100 ~ "0 < n <= 100",
      100 < Outcome_Sample_Size & Outcome_Sample_Size <= 500 ~ "100 < n <= 500",
      500 < Outcome_Sample_Size & Outcome_Sample_Size <= 1000 ~ "500 < n <= 1,000",
      1000 < Outcome_Sample_Size & Outcome_Sample_Size <= 5000 ~ "1,000 < n <= 5,000",
      5000 < Outcome_Sample_Size & Outcome_Sample_Size <= 10000 ~ "5,000 < n <= 10,000",
      10000 < Outcome_Sample_Size ~ "10,000 < n"),
    
    Sample_Size_Range = fct_relevel(Sample_Size_Range, c("0 < n <= 100", "100 < n <= 500", "500 < n <= 1,000", "1,000 < n <= 5,000", "5,000 < n <= 10,000", "10,000 < n")))
ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat, size = as_factor(Sample_Size_Range)))  
  geom_point() 

The error message I got is:

Error in grid.Call.graphics(C_setviewport, vp, TRUE) : non-finite location and/or size for viewport In addition: Warning messages: 1: Using size for a discrete variable is not advised. 2: Removed 16 rows containing missing values (geom_point).

Anyone has any suggestion about how to fix this?

CodePudding user response:

This seems like a good usecase for the binned scale for size, with which you can circumvent setting the variable as a factor altogether.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.1

# Dummy data
rct_findings <- data.frame(
  Effect_Size_Study = rnorm(100),
  F_test_var_stat = runif(100),
  Outcome_Sample_Size = runif(100, min = 6, max = 10000)
)

ggplot(rct_findings, aes(x = Effect_Size_Study, y = F_test_var_stat))  
  geom_point(aes(size = Outcome_Sample_Size))  
  scale_size_binned_area(
    limits = c(0, 10000),
    breaks = c(0, 100, 500, 1000, 5000, 10000),
  )

Created on 2021-12-14 by the reprex package (v2.0.1)

  • Related