Home > front end >  Define bar colors of matplotlib's barh based on dataframe column values
Define bar colors of matplotlib's barh based on dataframe column values

Time:05-12

I want to plot my table as a histogram using matplotlib's barh function. The table is a pandas dataframe with structure:

enter image description here

test is the independent variable and count is the dependent variable. I want the bars to be different colors for the different species (blue, green, orange, etc) so I tried the code

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(data={'test': np.linspace(1,16,16).astype(int),
                        'count': np.random.randint(30,120,16),
                        'species': ['species 1','species 1','species 1','species 2','species 1','species 1','species 2','species 2','species 1','species 2','species 2','species 2','species 1','species 1','species 1','species 1']
                       }
                 )

plt.figure(figsize=(14,6))
plt.barh(data=df, y='test', width='count')
# plt.barh(data=df, y='test', width='count', color='species', color_map={'species 1':'b', 'species 2':'g'})
plt.xlabel('frequency')
plt.ylabel('test number')

plt.show()

But I can only get the bars to be one color:

enter image description here

CodePudding user response:

You need to make species the index of the dataframe, then use .loc to graph the species separately. I also added a legend, but feel free to remove it:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.DataFrame(data={'test': np.linspace(1,16,16).astype(int),
                        'count': np.random.randint(30,120,16),
                        'species': ['species 1','species 1','species 1','species 2','species 1','species 1','species 2','species 2','species 1','species 2','species 2','species 2','species 1','species 1','species 1','species 1']
                       }
                 )

# Set the index to species
df.set_index('species', inplace=True)

# Seperately graph the species
plt.barh(data=df.loc['species 1'], y='test', width='count', label='species 1')
plt.barh(data=df.loc['species 2'], y='test', width='count', label='species 2')

# Add a legend to the graph
plt.legend()

# plt.barh(data=df, y='test', width='count', color='species', color_map={'species 1':'b', 'species 2':'g'})
plt.xlabel('frequency')
plt.ylabel('test number')

plt.show()

This resulted in the following graph: enter image description here

  • Related