Home > other >  How to change scaling of x/y axis to plot outliers in pandas dataframe?
How to change scaling of x/y axis to plot outliers in pandas dataframe?

Time:06-28

In a set of datapoints I am trying to graph on a scatterplot, there are a couple of huge anomaly points. For reference, most values range between 0-100 but occasionally there is an anomalous point of 100000. Because of this, when I graph on a scatterplot, box plot, or any plot that is, it zooms out so much to fit in all the points that the 99% of the points that range between 0-100 just looks like a tiny dot. Is there any way I can scale it so that the first 99% of the points are scaled accordingly and have the scale skip to the anomaly point's value so it fits in the graph?

Here is how the graphs look:

Box Plot: Box Plot

Scatter Plot: Scatter Plot

CodePudding user response:

You can set x/y axis scale to log or just set limit on x/y (with plt.xlim(0,200) for example) to hide anomalies from your chart:

import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('whitegrid')

plt.figure(figsize=(20,12))
data = [1,2,3,4,5,55,1,6,7,24,67,33,41,75,100_000,1_000_000]
plt.subplot(2,2,1)
plt.title('basic boxplot')
sns.boxplot(x=data)
plt.subplot(2,2,2)
plt.title('log x axis')
b = sns.boxplot(x=data)
b.set_xscale('log')
plt.subplot(2,2,3)
plt.title('basic scatter')
sns.scatterplot(x=data, y=data)
plt.subplot(2,2,4)
plt.title('log x/y scatter')
s = sns.scatterplot(x=data, y=data)
s.set_xscale('log')
s.set_yscale('log')
plt.show()

enter image description here

CodePudding user response:

Use the plt.axis() function with your limits.

plt.axis([x_min, x_max, y_min, y_max]) where x_min, x_max, y_min, and y_max are the coordinate limits for both axe

  • Related