Home > Software engineering >  Python: Plot histograms with customized bins
Python: Plot histograms with customized bins

Time:01-11

I am using matplotlib.pyplot to make a histogram. Due to the distribution of the data, I want manually set up the bins. The details are as follows:

  1. Any value = 0 in one bin;
  2. Any value > 60 in the last bin;
  3. Any value > 0 and <= 60 are in between the bins described above and the bin size is 5.

Could you please give me some help? Thank you.

CodePudding user response:

I'm not sure what you mean by "the bin size is 5". You can either plot a histogramm by enter image description here

(Note that 60 is moved to the last bin when specifying bins as a sequence, changing the sequence to [0, 0.5, 59.5, max(data)] would fix that)

What you (probably) need is first to categorize your data and then plot a bar chart of the categories:

import matplotlib.pyplot as plt
import pandas as pd

data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -5] # your data here

df = pd.DataFrame()
df['data'] = data

def find_cat(x):
    if x == 0:
        return "0"
    elif x > 60:
        return "> 60"
    elif x > 0:
        return "> 0 and <= 60"

df['category'] = df['data'].apply(find_cat)
df.groupby('category', as_index=False).count().plot.bar(x='category', y='data', rot=0, width=0.8)
plt.show()

Output:

enter image description here

CodePudding user response:

building off Tranbi's answer, you could specify the bin edges as detailed in the link they shared.

import matplotlib.pyplot as plt
import pandas as pd
data = [0, 0, 1, 2, 3, 4, 5, 6, 35, 60, 61, 82, -6] # your data here
df = pd.DataFrame()
df['data'] = data

bin_edges = [-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65]
bin_edges_offset = [x 0.000001 for x in bin_edges]

plt.figure()
plt.hist(df['data'], bins=bin_edges_offset)
plt.show()

histogram

  • Related