I prefer df.plot.scatter()
rather than plt.scatter()
when doing data exploration. However I'm unable
Generate Data
n = 1000
data = dict(
x = np.random.rand(n) np.random.rand(1)[0],
y = np.random.rand(n) np.random.rand(1)[0],
# color dimension
z = np.exp(np.random.rand(n)) - np.exp(np.random.rand(n)).mean(),
)
# throw it in a dataframe
df = pd.DataFrame(data)
Plotting with plt.scatter
The left plot uses CenteredNorm
to ensure its colorbar is centered around zero no matter the distribution skew.
cmap='bwr'
fig, (ax1, ax2) = plt.subplots(figsize=(20, 8), ncols=2)
sc = ax1.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap, norm=colors.CenteredNorm())
fig.colorbar(sc, ax=ax1)
sc = ax2.scatter(x=data['x'], y=data['y'], c=data['z'], cmap=cmap)
fig.colorbar(sc, ax=ax2)
plt.show()
Plotting with df.plot.scatter
df = pd.DataFrame(data)
fig, (ax1, ax2) = plt.subplots(figsize=(10, 4), ncols=2)
df.plot.scatter(x='x', y='y', c='z', norm=colors.CenteredNorm(), cmap=cmap, ax=ax1)
df.plot.scatter(x='x', y='y', c='z', cmap=cmap, ax=ax2)
plt.show()
Attempting the same with pandas inbuilt plotting API, raises the error:
TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'
Using kwargs
parameters
kwargs = dict(norm=colors.CenteredNorm())
df.plot.scatter(x='x', y='y', c='z',
cmap=cmap,
ax=ax1
**kwargs)
After a code correction from tdy, the snippet raises the same error:
TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'
Is there any way of setting the norm param via pandas inbuilt plotting API?
CodePudding user response:
Update: As of pandas 1.4.1, this is not possible due to the bug below. It will eventually be fixed by PR #45966.
df.plot.scatter
passes kwargs to df.plot
which passes kwargs to ax.scatter
.
The issue is that pandas already sets a norm
:
plotting/_matplotlib/core.py#L1114-L1122
scatter = ax.scatter( data[x].values, data[y].values, c=c_values, label=label, cmap=cmap, norm=norm, **self.kwds, )
This norm
is defined as either a BoundaryNorm
or None
:
plotting/_matplotlib/core.py#L1095-L1103
if color_by_categorical: # ... norm = colors.BoundaryNorm(bounds, cmap.N) else: norm = None
So passing another norm
via kwargs will produce the "multiple values" error.
This can be reproduced in pure matplotlib:
fig, ax = plt.subplots()
ax.scatter(0, 42, norm=None, **{'norm': colors.CenteredNorm()})
# TypeError: matplotlib.axes._axes.Axes.scatter() got multiple values for keyword argument 'norm'
If this functionality is important for you, consider opening a github issue describing your use case. There is already a fix in progress via PR #45966.
CodePudding user response:
As mentioned by @tdy, unpacking kwargs doesn't do the trick.
The function df.plot.scatter
takes the paramaters x, y, s, c
. Additional kwargs are passed to df.plot
. The following parameters are supported:
- x
- y
- kind
- ax
- subplots
- sharex
- sharey
- layout
- figsize
- use_index
- title
- grid
- legend
- style
- logx
- logy
- loglog
- xticks
- yticks
- xlim
- ylim
- rot
- fontsize
- colormap
- table
- yerr
- xerr
- secondary_y
- sort_columns
...but it will not take the parameter norm
. That would require extending pandas source code.