Plotting the theoretical distribution of exponential with a minimum on facets (ggh4x)-CodePudding

Users of a Shiny app can test data sets for Poisson, normality, and exponentiality. I am returning the results of the statistical test they chose. In addition, I thought it would be nice to plot the density from the data along with the theoretical distribution. They could be testing multiple sets of data at once, so I am faceting the plot.

From ggplot add Normal Distribution while using `facet_wrap` I found the really great ggh4x package. However, since this could be industry data, there may be a minimum that is not zero.

The problem is that theodensity(distri="exp") uses dexp which doesn't account for a minimum number, so the theoretical distribution plot doesn't match the data.

How can I tell the stat_theodensity that there is an xmin for each facet, which is the min of the data in the facet? I see that fitdistrplus can use different methods to fit an exponential curve, and that, for example, method="mse" would work. Is there a way to pass this through stat_theodensity?

library(ggh4x)

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100) 100
data2 <- rexp(n = 500,rate = 1/250) 500
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
p <- ggplot(data = plot_dat, aes(x=data)) 
  geom_density() 
  stat_theodensity(distri = "exp") 
  facet_wrap(facets = ~ID,scales = "free")
p

#what the first point of the graphs should be
dexp(x = 100-100,rate = 1/100)
#[1] 0.01
dexp(x = 500-500,rate = 1/250)
#[1] 0.004

********EDIT

OK I am getting closer. The following code works, but only for the second pass through the loop. If I change the numbers around for data1 and data2, it is always only the second one that plots the theoretical distribution.

I did ggplot_build after the first loop and it gives an error in fitdist(), which is code 100. I don't know why it would always fail on the first one but not on the second one, even with the same data.

Any ideas?

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250) 500
data2 <- rexp(n = 500,rate = 1/100) 250
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
p <- ggplot(data = plot_dat, aes(x=data)) 
  geom_density(color="red")

#loop through sets and add facets
for (set in unique(plot_dat$ID)){
  xmin <- min(plot_dat$data[ID == set])
  p<-p 
    stat_theodensity(
      data = ~subset(.x, ID == set),
      aes(x = stage(data - xmin, after_stat = x   xmin)),
      distri = "exp"
    )
}

  #stat_theodensity(distri = "exp") 
p<-p 
  facet_wrap(facets = ~ID,scales = "free")
p

CodePudding user response：

I don't know about the statistics of your problem, but if the issue is subtracting a number before calculating the density and afterwards adding it, you might do that with stage(). I couldn't find a more elegant way than hardcoding these values for each set separately, but I'd be happy to hear about more creative solutions.

library(ggh4x)
#> Loading required package: ggplot2

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100) 100
data2 <- rexp(n = 500,rate = 1/250) 500
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
ggplot(data = plot_dat, aes(x=data)) 
  geom_density()  
  stat_theodensity(
    data = ~ subset(.x, ID == "Set 1"),
    aes(x = stage(data - 100, after_stat = x   100)),
    distri = "exp"
  )  
  stat_theodensity(
    data = ~ subset(.x, ID == "Set 2"),
    aes(x = stage(data - 500, after_stat = x   500)),
    distri = "exp"
  )  
  facet_wrap(facets = ~ID,scales = "free")

^{Created on 2022-09-26 by the reprex package (v2.0.1)}

EDIT

I think OP's update had a problem with non-standard evaluation. It should work when you use a lapply() loop instead of a for-loop because then xmin is not a global variable that might be mistakingly looked up.

library(ggh4x)
#> Loading required package: ggplot2
library(ggplot2)

#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250) 500
data2 <- rexp(n = 500,rate = 1/100) 250
data <- c(data1,data2)

#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)

#make the data for plotting
plot_dat <- data.frame(ID,data)

#make the graph
p <- ggplot(data = plot_dat, aes(x=data)) 
  geom_density(color="red")  
  facet_wrap(facets = ~ ID, scales = "free")

#loop through sets and add facets
p   lapply(unique(plot_dat$ID), function(i) {
  xmin <- min(plot_dat$data[plot_dat$ID == i])
  stat_theodensity(
    data = ~ subset(.x, ID == i),
    aes(x = stage(data - xmin, after_stat = x   xmin)),
    distri = "exp"
  )
})

^{Created on 2022-09-27 by the reprex package (v2.0.1)}