Is there a way to color scatter plot markers based on histogram bins into which given value has been placed? Having following example data:
df = pd.DataFrame({
'length': [1.5, 0.5, 1.2, 0.9, 3],
'width': [0.7, 0.2, 0.15, 0.2, 1.1],
'price': [12, 9, 2, 1, 10]})
I can print histogram with hist = df.hist(bins=3, column='price')
as well as scatter plot with df.scatter.plot(x='length', y='width')
.
But I cannot figure out a way to combine both of those instructions. What I'd expect to see is have 3 colors on scatter plot and colors will be assigned based on bin where give value fits. Any idea how this can be achieved?
CodePudding user response:
The following approach first creates a colormap with 3 colors, and uses the price column to define the color. The scatter plot will add a colorbar indicating the relation between the colors and the prices.
import matplotlib.pyplot as plt
import pandas as pd`
import numpy as np
df = pd.DataFrame({
'length': [1.5, 0.5, 1.2, 0.9, 3],
'width': [0.7, 0.2, 0.15, 0.2, 1.1],
'price': [12, 9, 2, 1, 10]})
num_colors = 3
cmap = plt.get_cmap('rainbow', num_colors)
df.plot.scatter(x='length', y='width', c='price', cmap=cmap)
plt.tight_layout()
plt.show()
PS: If you want to indicate the price borders, you can add these as colorbar ticks:
ax = df.plot.scatter(x='length', y='width', c='price', cmap=cmap)
ax.collections[0].colorbar.set_ticks(np.linspace(df['price'].min(), df['price'].max(), num_colors 1))
CodePudding user response:
The histogram is just taking the range of values, and splitting it into equal size bins.