Home > Software engineering >  Is it possible to set the norm parameter in df.plot.scatter?
Is it possible to set the norm parameter in df.plot.scatter?


I prefer df.plot.scatter() rather than plt.scatter() when doing data exploration. However I'm unable

Generate Data

n = 1000
data = dict(
    x = np.random.rand(n)   np.random.rand(1)[0],
    y = np.random.rand(n)   np.random.rand(1)[0],
    # color dimension
    z = np.exp(np.random.rand(n)) - np.exp(np.random.rand(n)).mean(),
# throw it in a dataframe
df = pd.DataFrame(data)

Plotting with plt.scatter

The left plot uses CenteredNorm to ensure its colorbar is centered around zero no matter the distribution skew.

fig, (ax1, ax2) = plt.subplots(figsize=(20, 8), ncols=2)
sc = ax1.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap, norm=colors.CenteredNorm())
fig.colorbar(sc, ax=ax1)

sc = ax2.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap)
fig.colorbar(sc, ax=ax2)

enter image description here

Plotting with df.plot.scatter

df = pd.DataFrame(data)
fig, (ax1, ax2) = plt.subplots(figsize=(10, 4), ncols=2)
df.plot.scatter(x='x', y='y', c='z', norm=colors.CenteredNorm(), cmap=cmap, ax=ax1)
df.plot.scatter(x='x', y='y', c='z', cmap=cmap, ax=ax2)


Attempting the same with pandas inbuilt plotting API, raises the error:

TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

Using kwargs parameters

kwargs = dict(norm=colors.CenteredNorm())
df.plot.scatter(x='x', y='y', c='z',

After a code correction from tdy, the snippet raises the same error:

TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

Is there any way of setting the norm param via pandas inbuilt plotting API?

CodePudding user response:

Update: As of pandas 1.4.1, this is not possible due to the bug below. It will eventually be fixed by PR #45966.

df.plot.scatter passes kwargs to df.plot which passes kwargs to ax.scatter.

The issue is that pandas already sets a norm:


scatter = ax.scatter(

This norm is defined as either a BoundaryNorm or None:


if color_by_categorical:
    # ...
    norm = colors.BoundaryNorm(bounds, cmap.N)
    norm = None

So passing another norm via kwargs will produce the "multiple values" error.

This can be reproduced in pure matplotlib:

fig, ax = plt.subplots()
ax.scatter(0, 42, norm=None, **{'norm': colors.CenteredNorm()})

# TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

If this functionality is important for you, consider opening a github issue describing your use case. There is already a fix in progress via PR #45966.

CodePudding user response:

As mentioned by @tdy, unpacking kwargs doesn't do the trick.

The function df.plot.scatter takes the paramaters x, y, s, c. Additional kwargs are passed to df.plot. The following parameters are supported:

  • x
  • y
  • kind
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • figsize
  • use_index
  • title
  • grid
  • legend
  • style
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • xlim
  • ylim
  • rot
  • fontsize
  • colormap
  • table
  • yerr
  • xerr
  • secondary_y
  • sort_columns

...but it will not take the parameter norm. That would require extending pandas source code.

  • Related