Home > database >  Create custom (equally spaced) bins in ggplot for data with gaps
Create custom (equally spaced) bins in ggplot for data with gaps

Time:09-22

I have data with large degrees of separation between "clusters/groups" of values that I hope to make a histogram with, but dividing the bins into equal sized groups has been difficult. I'd like for zero (0) to have it's own bin, the total number of equally spaced bins be < 8 (ideally, to avoid crowding the plot) with an extra empty bin for "..." signifying the large gaps in-between the data values. The actual dataset has 800 zeros with maybe 5% data >0. Naturally the zeros will over-shadow the rest of the data, but a log transform will fix that. I just can't figure out the best way to break-up the data...

Data looks like this:

set.seed(123)

zero <- runif(50, min=0, max=0)
small <- runif(7, min=0, max=0.1)
medium <- runif(5, min=0, max=0.5)
high <- runif(3, min=1.5, max=2.5)

f <- function(x){
  return(data.frame(ID=deparse(substitute(x)), value=x))
}

all <- bind_rows(f(zero), f(small), f(medium), f(high))
all <- as.data.frame(all[,-1])
names(all)[1] <- "value"

My attempt:

bins <- all %>% mutate(bin = cut(all$value, breaks = c(0, seq(0.01:0.4), Inf), right = FALSE)) %>%
  count(bin, name = "freq") %>%
  add_row(bin = "...", freq = NA_integer_) %>%
  mutate(bin = fct_relevel(bin, "...", after = 0.4))

But I get this error:

Error in `mutate()`:
! Problem while computing `bin = fct_relevel(bin, "...", after = 0.5)`.
Caused by error:
! `idx` must contain one integer for each level of `f`

This is not equally spaced, but I'm looking for something like this as labels for my plot:

levels(bins$bin) <- c("0", "0.01-0.05", "0.05-0.1", "0.1-0.2", "0.2-0.3", "0.3-0.4", "...", "2.0 ")

ggplot(bins, aes(x = bin, y = freq, fill = bin))   
  geom_histogram(stat = "identity", colour = "black")

CodePudding user response:

You can use cut directly inside ggplot

ggplot(all, aes(cut(value, breaks = c(0, 0.25, 0.5, 3), inc = TRUE)))   
  geom_bar()   
  scale_y_log10()  
  labs(x = "value")

enter image description here

CodePudding user response:

This worked for me (using my own data):

bins <- WET %>% mutate(bin = cut(den, breaks = c(0, seq(0.001, 0.225, 0.15), 0.255, 0.3, Inf), right = FALSE)) %>%
count(bin, name = "freq") %>% # build frequency table, frequency = freq
add_row(bin = "...", freq = NA_integer_) %>% # add empty row for NA
mutate(bin = fct_relevel(bin, "...", after = 3)) # Put factor level "..." after 3! (the 3rd position)
levels(bins$bin) <- c("0", "0.001-0.15", "0.15-0.255", "...", "0.3 ")

# fct_relevel(f, "a", after = 2), "..., after = x, x must be an integer! (2nd position)

ggplot(bins, aes(x = bin, y = freq, fill = bin))   
  geom_bar(stat = "identity", colour = "black")  
  geom_text(aes(label = freq), vjust = -0.5)  
  scale_y_continuous(limits = c(0, 800), expand = expansion(mult = c(0, 0.05)))  
  scale_fill_brewer(name = "Density", palette="Greys", breaks = c("0", "0.001-0.15", "0.15-0.255", "0.3 "))  
  # Only show these legend values (exclude "...")
  labs(title = "Wet seasons - Pink shrimp density (no./m2)",x = "Density range", y = "Frequency")  
  theme(plot.title = element_text(hjust = 0.5))  
  theme(axis.text = element_text(size = 9, face = "bold"))   
  theme(axis.title = element_text(size = 13, face = "bold"))   # Axis titles
  theme(axis.title.x = element_text(vjust = -3))  
  theme(panel.border = element_rect(color = "black", fill = NA, size = 1))  
  # Adjust distance of x-axis title from plot
  theme(plot.margin = margin(t = 20,  # Top margin
                             r = 50,  # Right margin
                             b = 40,  # Bottom margin
                             l = 10)) # Left margin
  • Related