Users of a Shiny app can test data sets for Poisson, normality, and exponentiality. I am returning the results of the statistical test they chose. In addition, I thought it would be nice to plot the density from the data along with the theoretical distribution. They could be testing multiple sets of data at once, so I am faceting the plot.
From ggplot add Normal Distribution while using `facet_wrap` I found the really great ggh4x package. However, since this could be industry data, there may be a minimum that is not zero.
The problem is that theodensity(distri="exp")
uses dexp which doesn't account for a minimum number, so the theoretical distribution plot doesn't match the data.
How can I tell the stat_theodensity that there is an xmin for each facet, which is the min of the data in the facet? I see that fitdistrplus
can use different methods to fit an exponential curve, and that, for example, method="mse"
would work. Is there a way to pass this through stat_theodensity
?
library(ggh4x)
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100) 100
data2 <- rexp(n = 500,rate = 1/250) 500
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
p <- ggplot(data = plot_dat, aes(x=data))
geom_density()
stat_theodensity(distri = "exp")
facet_wrap(facets = ~ID,scales = "free")
p
#what the first point of the graphs should be
dexp(x = 100-100,rate = 1/100)
#[1] 0.01
dexp(x = 500-500,rate = 1/250)
#[1] 0.004
********EDIT
OK I am getting closer. The following code works, but only for the second pass through the loop. If I change the numbers around for data1 and data2, it is always only the second one that plots the theoretical distribution.
I did ggplot_build after the first loop and it gives an error in fitdist(), which is code 100. I don't know why it would always fail on the first one but not on the second one, even with the same data.
Any ideas?
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250) 500
data2 <- rexp(n = 500,rate = 1/100) 250
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
p <- ggplot(data = plot_dat, aes(x=data))
geom_density(color="red")
#loop through sets and add facets
for (set in unique(plot_dat$ID)){
xmin <- min(plot_dat$data[ID == set])
p<-p
stat_theodensity(
data = ~subset(.x, ID == set),
aes(x = stage(data - xmin, after_stat = x xmin)),
distri = "exp"
)
}
#stat_theodensity(distri = "exp")
p<-p
facet_wrap(facets = ~ID,scales = "free")
p
CodePudding user response:
I don't know about the statistics of your problem, but if the issue is subtracting a number before calculating the density and afterwards adding it, you might do that with stage()
. I couldn't find a more elegant way than hardcoding these values for each set separately, but I'd be happy to hear about more creative solutions.
library(ggh4x)
#> Loading required package: ggplot2
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/100) 100
data2 <- rexp(n = 500,rate = 1/250) 500
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
ggplot(data = plot_dat, aes(x=data))
geom_density()
stat_theodensity(
data = ~ subset(.x, ID == "Set 1"),
aes(x = stage(data - 100, after_stat = x 100)),
distri = "exp"
)
stat_theodensity(
data = ~ subset(.x, ID == "Set 2"),
aes(x = stage(data - 500, after_stat = x 500)),
distri = "exp"
)
facet_wrap(facets = ~ID,scales = "free")
Created on 2022-09-26 by the reprex package (v2.0.1)
EDIT
I think OP's update had a problem with non-standard evaluation. It should work when you use a lapply()
loop instead of a for-loop because then xmin
is not a global variable that might be mistakingly looked up.
library(ggh4x)
#> Loading required package: ggplot2
library(ggplot2)
#generate 2 exponential distributions with xmin > 0
data1 <- rexp(n = 500,rate = 1/250) 500
data2 <- rexp(n = 500,rate = 1/100) 250
data <- c(data1,data2)
#generate a code for facets
ID1 <- c(rep("Set 1",times=500))
ID2 <- c(rep("Set 2",times=500))
ID <- c(ID1,ID2)
#make the data for plotting
plot_dat <- data.frame(ID,data)
#make the graph
p <- ggplot(data = plot_dat, aes(x=data))
geom_density(color="red")
facet_wrap(facets = ~ ID, scales = "free")
#loop through sets and add facets
p lapply(unique(plot_dat$ID), function(i) {
xmin <- min(plot_dat$data[plot_dat$ID == i])
stat_theodensity(
data = ~ subset(.x, ID == i),
aes(x = stage(data - xmin, after_stat = x xmin)),
distri = "exp"
)
})
Created on 2022-09-27 by the reprex package (v2.0.1)