Home > database >  Using histogram info in scatter plot
Using histogram info in scatter plot

Time:02-20

Is there a way to color scatter plot markers based on histogram bins into which given value has been placed? Having following example data:

df = pd.DataFrame({
    'length': [1.5, 0.5, 1.2, 0.9, 3],
    'width': [0.7, 0.2, 0.15, 0.2, 1.1],
    'price': [12, 9, 2, 1, 10]})

I can print histogram with hist = df.hist(bins=3, column='price') as well as scatter plot with df.scatter.plot(x='length', y='width').

But I cannot figure out a way to combine both of those instructions. What I'd expect to see is have 3 colors on scatter plot and colors will be assigned based on bin where give value fits. Any idea how this can be achieved?

CodePudding user response:

The following approach first creates a colormap with 3 colors, and uses the price column to define the color. The scatter plot will add a colorbar indicating the relation between the colors and the prices.

import matplotlib.pyplot as plt
import pandas as pd`
import numpy as np

df = pd.DataFrame({
    'length': [1.5, 0.5, 1.2, 0.9, 3],
    'width': [0.7, 0.2, 0.15, 0.2, 1.1],
    'price': [12, 9, 2, 1, 10]})
num_colors = 3
cmap = plt.get_cmap('rainbow', num_colors)
df.plot.scatter(x='length', y='width', c='price', cmap=cmap)
plt.tight_layout()
plt.show()

scatter plot colored via third column

PS: If you want to indicate the price borders, you can add these as colorbar ticks:

ax = df.plot.scatter(x='length', y='width', c='price', cmap=cmap)
ax.collections[0].colorbar.set_ticks(np.linspace(df['price'].min(), df['price'].max(), num_colors   1))

showing the colorbar bins

CodePudding user response:

The histogram is just taking the range of values, and splitting it into equal size bins.

You can do this by using binned scatterplot

  • Related