Home > Software engineering >  Is it possible to set the norm parameter in df.plot.scatter?
Is it possible to set the norm parameter in df.plot.scatter?

Time:03-01

I prefer df.plot.scatter() rather than plt.scatter() when doing data exploration. However I'm unable

Generate Data

n = 1000
data = dict(
    x = np.random.rand(n)   np.random.rand(1)[0],
    y = np.random.rand(n)   np.random.rand(1)[0],
    # color dimension
    z = np.exp(np.random.rand(n)) - np.exp(np.random.rand(n)).mean(),
)
# throw it in a dataframe
df = pd.DataFrame(data)

Plotting with plt.scatter

The left plot uses CenteredNorm to ensure its colorbar is centered around zero no matter the distribution skew.

cmap='bwr'
fig, (ax1, ax2) = plt.subplots(figsize=(20, 8), ncols=2)
sc = ax1.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap, norm=colors.CenteredNorm())
fig.colorbar(sc, ax=ax1)

sc = ax2.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap)
fig.colorbar(sc, ax=ax2)
plt.show()

enter image description here

Plotting with df.plot.scatter

df = pd.DataFrame(data)
fig, (ax1, ax2) = plt.subplots(figsize=(10, 4), ncols=2)
df.plot.scatter(x='x', y='y', c='z', norm=colors.CenteredNorm(), cmap=cmap, ax=ax1)
df.plot.scatter(x='x', y='y', c='z', cmap=cmap, ax=ax2)

plt.show()

Attempting the same with pandas inbuilt plotting API, raises the error:

TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

Using kwargs parameters

kwargs = dict(norm=colors.CenteredNorm())
df.plot.scatter(x='x', y='y', c='z',
                cmap=cmap,
                ax=ax1
                **kwargs)

After a code correction from tdy, the snippet raises the same error:

TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

Is there any way of setting the norm param via pandas inbuilt plotting API?

CodePudding user response:

Update: As of pandas 1.4.1, this is not possible due to the bug below. It will eventually be fixed by PR #45966.


df.plot.scatter passes kwargs to df.plot which passes kwargs to ax.scatter.

The issue is that pandas already sets a norm:

plotting/_matplotlib/core.py#L1114-L1122

scatter = ax.scatter(
    data[x].values,
    data[y].values,
    c=c_values,
    label=label,
    cmap=cmap,
    norm=norm,
    **self.kwds,
)

This norm is defined as either a BoundaryNorm or None:

plotting/_matplotlib/core.py#L1095-L1103

if color_by_categorical:
    # ...
    norm = colors.BoundaryNorm(bounds, cmap.N)
else:
    norm = None

So passing another norm via kwargs will produce the "multiple values" error.

This can be reproduced in pure matplotlib:

fig, ax = plt.subplots()
ax.scatter(0, 42, norm=None, **{'norm': colors.CenteredNorm()})

# TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'

If this functionality is important for you, consider opening a github issue describing your use case. There is already a fix in progress via PR #45966.

CodePudding user response:

As mentioned by @tdy, unpacking kwargs doesn't do the trick.

The function df.plot.scatter takes the paramaters x, y, s, c. Additional kwargs are passed to df.plot. The following parameters are supported:

  • x
  • y
  • kind
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • figsize
  • use_index
  • title
  • grid
  • legend
  • style
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • xlim
  • ylim
  • rot
  • fontsize
  • colormap
  • table
  • yerr
  • xerr
  • secondary_y
  • sort_columns

...but it will not take the parameter norm. That would require extending pandas source code.

  • Related