Home > Back-end >  Scatter plot to show the majorities and include extreme numbers
Scatter plot to show the majorities and include extreme numbers

Time:12-01

Simple data as below and I want to put them in a scatter plot.

It goes well if there's not outliers (i.e. extremely big numbers).

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

dates = ["2021-01-01",
"2021-01-01", "2021-01-06",
"2021-01-08", "2021-01-12",
"2021-02-01", "2021-02-11",
"2021-02-12", "2021-02-15",
"2021-02-16", "2021-03-11",
"2021-03-21", "2021-03-22",
"2021-03-23", "2021-03-24",
"2021-04-02", "2021-04-12",
"2021-04-22", "2021-04-26",
"2021-04-30"]

numbers= [6400,
5100,5000,
4000,3686,
9000,8050,
8000,6050,
6000,9000,
8500,7800,
7000,6000,
10000,9600,
8000,7883,
6686]

dates = [pd.to_datetime(d) for d in dates]

plt.scatter(dates, numbers, s =100, c = 'red')
plt.show()

enter image description here

But when there are one or more extreme numbers, for example the last number 6686 became 66860. The new plot shows most the scatters insignificant (because of the the new y-axis).

enter image description here

What's the good solution to have a scatter plot as before (keeping the y-axis as it was), and still visualizing the extreme numbers?

The purpose of the chart is show and focus the distribution of the scatters under 10000, and also note there are extreme numbers.

Thank you.

CodePudding user response:

If you don't want to use a log scale, you can break the plot in two (or more) and plot the values below/above a threshold:

df = pd.DataFrame({'num': numbers}, index=dates)
thresh = 12000

f, (ax1, ax2) = plt.subplots(nrows=2, sharex=True,
                             gridspec_kw={'height_ratios': (1,3)},
                             figsize=(10,4)
                            )

lows  = df.mask(df['num'].ge(thresh))
highs = df.mask(df['num'].lt(thresh))

ax2.scatter(df.index, lows)
ax1.scatter(df.index, highs)

output:

enter image description here

  • Related