I trained a model to predict a value and I want to make a bar chart that plots target - prediction
for each sample, and then color these bars according to a category. I then want to add two horizontal lines for plus or minus sigma around the central axis, so it's clear which predictions are very far off. Imagine we know sigma == 0.3 and we have a dataframe
error | sample_id | category |
---|---|---|
.1 | 1 | 'A' |
.4 | 2 | 'A' |
.1 | 3 | 'B' |
-.2 | 4 | 'B' |
-.1 | 5 | 'C' |
How could I do this? I've managed to do just the errors and the plus or minus sigma lines just using matplotlib, here it is to communicate what I mean.
CodePudding user response:
You'll find the pd.Series.transform()
and/or pd.DataFrame.apply()
methods quite useful. Essentially, you can map each value of your input columns (in this case errors
) into some valid color value, returning a pd.Series
of colors that's the same shape as errors.
The phrasing of the question is unclear, but it sounds like you want a single pair of lines for each category? In which case, you will first need to do a pd.Series.groupby()
operation to get the shape that you want before the transform opeartion. Probably just a series of length 3, for your A B C categories.
Then, this Series
(whether it is of length len(df)
or df.category.nunique()
) can be passed into your plt.bar
method as the color
argument.
CodePudding user response:
This is actually very easy, I just didn't understand the 'color' option of plt.bar. If it is a list of length equal to the number of bars, then it will color each bar with the corresponding color. It's as simple as
plt.(x,y,color = z)
#len(x) = len(y) = len(z), and z is an array of colors
As krukah mentions, you just need to translate categories to colors. I picked a color map, made a dictionary that picked a color for each unique category, and then turned the cats array (a 2d np array, each row encodes a category) into an array of colors.
unique_cats = np.unique(cats, axis=0)
n_unique = unique_cats.shape[0]
for_picking = np.arange(0,1,1/n_unique)
cmap = plt.cm.get_cmap('plasma')
color_dict = {}
#this for loop fills in the dictionary by picking colors from the cmap
for i in range(n_unique):
color_dict[str(unique_cats[i])] =cmap(for_picking[i])
color_cats = [color_dict[str(cat)] for cat in cats]
Hopefully that helps someone some day.