Home > Blockchain >  How to create a confidence interval with plt.fill_between inside a scatter plot
How to create a confidence interval with plt.fill_between inside a scatter plot

Time:01-03

I created a scatter plot that uses data from two sources: x = []and y = []. In a second step, I added a linear regression line for the two lists of data above using the following code:

(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")

The result of that is a standard scatterplot as shown below. enter image description here

Now I would like to add a 95% confidence interval to the black regression line, using plt.fill_between. I know that there are many topics on this, I read through many of them, but I cannot solve the problem, i.e., adapting a code to my particular code and regression line.

Adding

CI = 1.96 * np.std(y) / np.mean(y)
plt.fill_between(y, (y-CI), (y CI), color='blue', alpha=0.1)

to my code results in the following output below.

enter image description here

The blueish confidence interval by plt.fill_between is somewhere drawn on the left side of the image, but not around the regression line. What I would like to achieve is that the confidence interval draws around the black regression line. The full code is shown subsequently:

import numpy as np
import matplotlib.pyplot as plt


# Scatter plot
x = [0.472202, 0.685151, 0.287613, 0.546364, 0.518002, 0.675128, 0.462418, 0.61817, 0.692822, 0.23433,
     0.194009, 0.720232, 0.597321, 0.625955, 0.660571, 0.737754, 0.436876, 0.689937, 0.483067, 0.646723,
     0.699367, 0.384102, 0.561493]

y = [0.131113, 0.123865, 0.150355, 0.138914, 0.140417, 0.119358, 0.130019, 0.129782, 0.113508, 0.13434,
     0.15162, 0.125768, 0.128473, 0.128056, 0.114403, 0.142878, 0.139192, 0.118033, 0.132616, 0.133043,
     0.133973, 0.146611, 0.129792]

(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")

CI = 1.96 * np.std(y) / np.mean(y)
plt.fill_between(y, (y-CI), (y CI), color='blue', alpha=0.1)

plt.scatter(x, y, s=250, linewidths=2, zorder=2)

plt.show()

CodePudding user response:

You should plot the predicted value Y_Polyval instead of the true value y and sort the (x, y) values to fill the areas:

plt.fill_between(x, (Y_Polyval-CI), (Y_Polyval CI), color='blue', alpha=0.1)

enter image description here

Full Example

import numpy as np
import matplotlib.pyplot as plt


# Scatter plot
x = [0.472202, 0.685151, 0.287613, 0.546364, 0.518002, 0.675128, 0.462418, 0.61817, 0.692822, 0.23433,
     0.194009, 0.720232, 0.597321, 0.625955, 0.660571, 0.737754, 0.436876, 0.689937, 0.483067, 0.646723,
     0.699367, 0.384102, 0.561493]

y = [0.131113, 0.123865, 0.150355, 0.138914, 0.140417, 0.119358, 0.130019, 0.129782, 0.113508, 0.13434,
     0.15162, 0.125768, 0.128473, 0.128056, 0.114403, 0.142878, 0.139192, 0.118033, 0.132616, 0.133043,
     0.133973, 0.146611, 0.129792]

# Sort coordinate values
coords = [(a, b) for a, b in zip(x, y)]
coords = sorted(coords, key=lambda x: x[1], reverse=True)
x, y = zip(*coords)

(m, b) = np.polyfit(x, y, 1)
Y_Polyval = np.polyval([m, b], x)
plt.plot(x, Y_Polyval, linewidth=3, c="black")
plt.scatter(x, y, s=250, linewidths=2, zorder=2)
plt.fill_between(x, (Y_Polyval-CI), (Y_Polyval CI), color='blue', alpha=0.1)

enter image description here

  • Related