Home > Enterprise >  How to remove zig-zag pattern in marginal distribution plot of integer values in R?
How to remove zig-zag pattern in marginal distribution plot of integer values in R?

Time:02-22

I am including marginal distribution plots on a scatterplot of a continuous and integer variable. However, in the integer variable maringal distribution plot (y-axis) there is this zig-zag pattern that shows up because the y-values are all integers. Is there any way to increase the "width" (not sure that's the right term) of the bins/values the function calculates the distribution density over?

The goal is to get rid of that zig-zag pattern that develops because the y-values are integers.

library(GlmSimulatoR)
library(ggplot2)
library(patchwork)

### Create right-skewed dataset that has one continous variable and one integer variable
set.seed(123)
df1 <- data.frame(matrix(ncol = 2, nrow = 1000))
x <- c("int","cont")
colnames(df1) <- x
df1$int <- round(rgamma(1000, shape = 1, scale = 1),0)
df1$cont <- round(rgamma(1000, shape = 1, scale = 1),1)

p1 <- ggplot(data = df1, aes(x = cont, y = int))  
  geom_point(shape = 21, size = 2, color = "black", fill = "black", stroke = 1, alpha = 0.4)  
  xlab("Continuous Value")  
  ylab("Integer Value")  
  theme_bw()  
  theme(panel.grid = element_blank(),
        text = element_text(size = 16),
        axis.text.x = element_text(size = 16, color = "black"),
        axis.text.y = element_text(size = 16, color = "black"))

dens1 <- ggplot(df1, aes(x = cont))  
  geom_density(alpha = 0.4)  
  theme_void()  
  theme(legend.position = "none")
dens2 <- ggplot(df1, aes(x = int))  
  geom_density(alpha = 0.4)  
  theme_void()  
  theme(legend.position = "none")  
  coord_flip()

dens1   plot_spacer()   p1   dens2  
  plot_layout(ncol = 2, nrow = 2, widths = c(6,1), heights = c(1,6))

CodePudding user response:

From ?geom_density:

adjust: A multiplicate [sic] bandwidth adjustment. This makes it possible to adjust the bandwidth while still using the a bandwidth estimator. For example, ‘adjust = 1/2’ means use half of the default bandwidth.

So as a start try e.g. geom_density(..., adjust = 2) (bandwidth twice as wide as default) and go from there.

  • Related