Create a bar chart with bars colored according to a category and line on the same chart-CodePudding

I trained a model to predict a value and I want to make a bar chart that plots target - prediction for each sample, and then color these bars according to a category. I then want to add two horizontal lines for plus or minus sigma around the central axis, so it's clear which predictions are very far off. Imagine we know sigma == 0.3 and we have a dataframe

error	sample_id	category
.1	1	'A'
.4	2	'A'
.1	3	'B'
-.2	4	'B'
-.1	5	'C'

How could I do this? I've managed to do just the errors and the plus or minus sigma lines just using matplotlib, here it is to communicate what I mean.

CodePudding user response：

You'll find the pd.Series.transform() and/or pd.DataFrame.apply() methods quite useful. Essentially, you can map each value of your input columns (in this case errors) into some valid color value, returning a pd.Series of colors that's the same shape as errors.

The phrasing of the question is unclear, but it sounds like you want a single pair of lines for each category? In which case, you will first need to do a pd.Series.groupby() operation to get the shape that you want before the transform opeartion. Probably just a series of length 3, for your A B C categories.

Then, this Series (whether it is of length len(df) or df.category.nunique()) can be passed into your plt.bar method as the color argument.

CodePudding user response：

This is actually very easy, I just didn't understand the 'color' option of plt.bar. If it is a list of length equal to the number of bars, then it will color each bar with the corresponding color. It's as simple as

plt.(x,y,color = z)
#len(x) = len(y) = len(z), and z is an array of colors

As krukah mentions, you just need to translate categories to colors. I picked a color map, made a dictionary that picked a color for each unique category, and then turned the cats array (a 2d np array, each row encodes a category) into an array of colors.

unique_cats = np.unique(cats, axis=0)
n_unique = unique_cats.shape[0]
for_picking = np.arange(0,1,1/n_unique)

cmap = plt.cm.get_cmap('plasma')
color_dict = {}

#this for loop fills in the dictionary by picking colors from the cmap
for i in range(n_unique):
    color_dict[str(unique_cats[i])] =cmap(for_picking[i])

color_cats = [color_dict[str(cat)] for cat in cats]

Hopefully that helps someone some day.