Home > Software engineering >  pandas.DataFrame.plot showing colormap inconsistently
pandas.DataFrame.plot showing colormap inconsistently

Time:10-08

So am trying to make some plots and was trying to use the cmap "jet". It kept appearing as viridis, so I dug around SE and tried some very simple plots:

import numpy as np
import matplotlib.pyplot as plt

x = np.arange(0, 100)
y = x
t = x
df = pd.DataFrame([x,y]).T

df.plot(kind="scatter", x=0, y=1, c=t, cmap="jet")

enter image description here

x = np.arange(0, 100.1)
y = x
t = x
df = pd.DataFrame([x,y]).T

df.plot(kind="scatter", x=0, y=1, c=t, cmap="jet")

enter image description here

Any thoughts on what is going on here? I can tell that it has something to do with the dtype of the fields in the dataframe (added dypte="float" to the first set of code and got the same result as in the second set of code), but don't see why this would be the case.

Naturally, what I really would like is a workaround if there isn't something wrong with my code.

CodePudding user response:

It actually seems to be related to pandas (scatter) plot and as you've pointed out to dtype float - some more details at the end.

A workaround is to use matplotlib.
The plot is looking the same in the end, but the cmap="jet" setting is also applied for float dtype:

enter image description here

Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x = np.arange(0, 100.1)
y = x
t = x
df = pd.DataFrame([x,y]).T

fig, ax = plt.subplots(1,1)

sc_plot = ax.scatter(df[0], df[1], c=t, cmap="jet")
fig.colorbar(sc_plot)

ax.set_ylabel('1')
ax.set_xlabel('0')

plt.show()

Or the shorter version (a little bit closer to the brief df.plot call) using pyplot instead of the Object Oriented Interface:

df = pd.DataFrame([x,y]).T

sc_plot = plt.scatter(df[0], df[1], c=t, cmap="jet")
plt.colorbar(sc_plot)
plt.ylabel('1')
plt.xlabel('0')
plt.show()

Concerning the root cause why pandas df.plot isn't following the cmap setting:

The closest I could find is that pandas scatter plot c takes

str, int or array-like

(while I'm not sure why t isn't referring to the index which would be int again).

Even df.plot(kind="scatter", x=0, y=1, c=df.index.values.tolist(), cmap='jet') falls back to viridis, while df.index.values.tolist() clearly is just int.

Which is even more strange, as pandas df.plot also uses matplotlib by default:

Uses the backend specified by the option plotting.backend. By default, matplotlib is used.

  • Related