I have data with large degrees of separation between "clusters/groups" of values that I hope to make a histogram with, but dividing the bins into equal sized groups has been difficult. I'd like for zero (0) to have it's own bin, the total number of equally spaced bins be < 8 (ideally, to avoid crowding the plot) with an extra empty bin for "..." signifying the large gaps in-between the data values. The actual dataset has 800 zeros with maybe 5% data >0. Naturally the zeros will over-shadow the rest of the data, but a log transform will fix that. I just can't figure out the best way to break-up the data...
Data looks like this:
set.seed(123)
zero <- runif(50, min=0, max=0)
small <- runif(7, min=0, max=0.1)
medium <- runif(5, min=0, max=0.5)
high <- runif(3, min=1.5, max=2.5)
f <- function(x){
return(data.frame(ID=deparse(substitute(x)), value=x))
}
all <- bind_rows(f(zero), f(small), f(medium), f(high))
all <- as.data.frame(all[,-1])
names(all)[1] <- "value"
My attempt:
bins <- all %>% mutate(bin = cut(all$value, breaks = c(0, seq(0.01:0.4), Inf), right = FALSE)) %>%
count(bin, name = "freq") %>%
add_row(bin = "...", freq = NA_integer_) %>%
mutate(bin = fct_relevel(bin, "...", after = 0.4))
But I get this error:
Error in `mutate()`:
! Problem while computing `bin = fct_relevel(bin, "...", after = 0.5)`.
Caused by error:
! `idx` must contain one integer for each level of `f`
This is not equally spaced, but I'm looking for something like this as labels for my plot:
levels(bins$bin) <- c("0", "0.01-0.05", "0.05-0.1", "0.1-0.2", "0.2-0.3", "0.3-0.4", "...", "2.0 ")
ggplot(bins, aes(x = bin, y = freq, fill = bin))
geom_histogram(stat = "identity", colour = "black")
CodePudding user response:
You can use cut
directly inside ggplot
ggplot(all, aes(cut(value, breaks = c(0, 0.25, 0.5, 3), inc = TRUE)))
geom_bar()
scale_y_log10()
labs(x = "value")
CodePudding user response:
This worked for me (using my own data):
bins <- WET %>% mutate(bin = cut(den, breaks = c(0, seq(0.001, 0.225, 0.15), 0.255, 0.3, Inf), right = FALSE)) %>%
count(bin, name = "freq") %>% # build frequency table, frequency = freq
add_row(bin = "...", freq = NA_integer_) %>% # add empty row for NA
mutate(bin = fct_relevel(bin, "...", after = 3)) # Put factor level "..." after 3! (the 3rd position)
levels(bins$bin) <- c("0", "0.001-0.15", "0.15-0.255", "...", "0.3 ")
# fct_relevel(f, "a", after = 2), "..., after = x, x must be an integer! (2nd position)
ggplot(bins, aes(x = bin, y = freq, fill = bin))
geom_bar(stat = "identity", colour = "black")
geom_text(aes(label = freq), vjust = -0.5)
scale_y_continuous(limits = c(0, 800), expand = expansion(mult = c(0, 0.05)))
scale_fill_brewer(name = "Density", palette="Greys", breaks = c("0", "0.001-0.15", "0.15-0.255", "0.3 "))
# Only show these legend values (exclude "...")
labs(title = "Wet seasons - Pink shrimp density (no./m2)",x = "Density range", y = "Frequency")
theme(plot.title = element_text(hjust = 0.5))
theme(axis.text = element_text(size = 9, face = "bold"))
theme(axis.title = element_text(size = 13, face = "bold")) # Axis titles
theme(axis.title.x = element_text(vjust = -3))
theme(panel.border = element_rect(color = "black", fill = NA, size = 1))
# Adjust distance of x-axis title from plot
theme(plot.margin = margin(t = 20, # Top margin
r = 50, # Right margin
b = 40, # Bottom margin
l = 10)) # Left margin