Home > other >  How to arrange bins in stacked histogram, Python
How to arrange bins in stacked histogram, Python

Time:03-01

I am working on a code of a stacked histogram and I need help arranging the bins in the order if this is possible.

0.01 - 0.1, 0.1 - 0.5, 0.5 - 1.0, 1.0 - 2.5, > 2.5

Right now, my histogram looks like this:

Histogram

with the order of bins being:

0.01 - 0.1, 1.0 - 2.5, > 2.5, 0.1 - 0.5, 0.5 - 1.0

Code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = [['0.01 - 0.1','A'],['0.1 - 0.5','B'],['0.5 - 1.0','B'],['0.01 - 0.1','C'],['> 2.5','A'],['1.0 - 2.5','A'],['> 2.5','A']]

df = pd.DataFrame(data, columns = ['Size','Index'])

### HISTOGRAM OF SIZE

df_new = df.sort_values(['Size'])

x_var = 'Size'
groupby_var = 'Index'
df_new_agg = df_new.loc[:, [x_var, groupby_var]].groupby(groupby_var)
vals = [df_new[x_var].values.tolist() for i, df_new in df_new_agg]

list_of_colors_element = ['lightcoral','palegreen','forestgreen']

# Draw
plt.figure(figsize=(16,10), dpi= 80)
colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
n, bins, patches = plt.hist(vals, df_new[x_var].unique().__len__(), stacked=True, density=False, color=list_of_colors_element)

# Decorations
plt.legend({group:col for group, col in zip(np.unique(df_new[groupby_var]).tolist(), list_of_colors_element)}, prop={'size': 16})
plt.title("Stacked Histogram of Size colored by element of highest share", fontsize=22)
plt.xlabel(x_var, fontsize=22)
plt.ylabel("Frequency", fontsize=22)
plt.grid(color='black', linestyle='--', linewidth=0.4)
plt.xticks(range(5),fontsize=15)
plt.yticks(fontsize=15)

plt.show()

Any help is appreciated!

CodePudding user response:

You can use:

piv = df_new.assign(dummy=1) \
            .pivot_table('dummy', 'Size', 'Index', aggfunc='count', fill_value=0) \
            .rename_axis(columns=None)
ax = piv.plot.bar(stacked=True, color=list_of_colors_element, rot=0, width=1)
plt.show()

enter image description here

CodePudding user response:

I think I'd take a different route and represent the input data differently altogether to make the code easier to read.

import matplotlib.pyplot as plt

labels = ['0.01 - 0.1', '0.1 - 0.5', '0.5 - 1', '1.0 - 2.5', '> 2.5']
A = [1, 0, 0, 1, 2]
B = [0, 1, 1, 0, 0]
C = [1, 0, 0, 0, 0]
width = 1

fig, ax = plt.subplots()

ax.bar(labels, A, width, label='A', color='lightcoral')
ax.bar(labels, B, width, bottom=A, label='B', color='palegreen')
ax.bar(labels, C, width, bottom=A, label='C', color='forestgreen')

ax.set_ylabel('Frequency')
ax.set_xlabel('Size')
ax.set_title("Stacked Histogram of Size colored by element of highest share")

plt.show()

graph

  • Related