Home > Mobile >  R cdplot() - does the right axis show probability or density?
R cdplot() - does the right axis show probability or density?

Time:05-10

Reproducible data:

## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                 1, 2, 1, 1, 1, 1, 1),
               levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

## CD plot
cdplot(fail ~ temperature)

The documentation for cdplot says:

cdplot computes the conditional densities of x given the levels of y weighted by the marginal distribution of y. The densities are derived cumulatively over the levels of y. The conditional probabilities are not derived by discretization (as in the spinogram), but using a smoothing approach via density.The conditional density functions (cumulative over the levels of y) are returned invisibly.

So on the plot where x = 63, y = 0.4 (approximately). Is this probability, or probability density? I am confused by the documentation as to what is calculated, what is returned and what is plotted.

CodePudding user response:

The plot shows the probability of an outcome for a given temperature.

What the docs are saying is that a standard density distribution is calculated for temperature measurements when fail is "yes", and a density is worked out separately for temperature when fail is no. If we divide the density of "yes" temperatures by the density of "no" temperatures, then we will get the odds of drawing a "yes" at a given temperature. We can then convert the odds to probabilities.

To show this is the case, let's see the cdplot:

cdplot(fail ~ temperature)

enter image description here

Now let's calculate the probabilities from the marginal densities manually and plot. We should get a near-identical shape to our curve

yes   <- density(temperature[fail == "yes"], from = 55, to = 80)
no    <- density(temperature[fail == "no"], from = 55, to = 80)

odds  <- yes$y/no$y

probs <- odds / (odds   1)

plot(yes$x, probs, type = "l")

enter image description here

  • Related