Hello everyone I have some data with which I need to create a nice histogram.
Firstly I used the hist()
to create a base one and after researching I found out that it uses the sturges method to count how many bins will be needed. In order to make a more customizable and good-looking histogram, I tried using the ggplot
package and manually entering the number of bins I need. As you can see in the photos the histograms are not the same cause on the y-axis using hist()it reaches up to 60 freq while with the ggplot it surpasses that.
Additionally, I'm having a hard time getting the ggplot to show proper ticks on the X I can't find any reference on how to mod the tick marks so that they align with the breaks without messing up the graph.
Any ideas and help would be really appreciated.
Photos: https://prnt.sc/greVRNoGo67T https://prnt.sc/bMl29-2Fr5BN
CodePudding user response:
One way to solve the problem is to do some pre-processing and plot a bar plot.
The pre-processing is to bin the data with cut
. This transforms the continuous variable Total_Volume
in a categorical variable but since it is done in a pipe the transformation is temporary, the original data remains unchanged.
The breaks are equidistant and the labels are the mid point values rounded.
Note that the very small counts are plotted with bars that are not visible. But they are there.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
})
brks <- seq(min(x) - 1, max(x) 1, length.out = 30)
labs <- round(brks[-1] diff(brks)/2)
data.frame(Total_Volume = x) %>%
mutate(Total_Volume = cut(x, breaks = brks, labels = labs)) %>%
ggplot(aes(Total_Volume))
geom_bar(width = 1)
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Created on 2022-10-02 with reprex v2.0.2
Data creation code.
set.seed(2022)
n <- 1e6
x <- rchisq(n, df = 1, ncp = 5.5e6)
i <- x > 5.5e6
x[i] <- rnorm(sum(i), mean = 5.5e6, sd = 1e4)
Created on 2022-10-02 with reprex v2.0.2