Home > front end >  Python Pandas Plot graphs in percentage
Python Pandas Plot graphs in percentage

Time:11-23

I have data of States and Classes as below. I am trying to plot the total, and different percentages using matplotlib.

data = [['FL', 2], ['AR', 0], ['CA', 0], ['CA', 1], ['AR', 1], ['FL', 0], ['CA', 0], ['CA', 1], ['AR', 2], ['FL', 1], 
        ['AR', 0], ['FL', 2], ['CA', 1], ['FL', 1], ['AR', 1], ['AR', 2], ['AR', 1], ['FL', 2], ['CA', 2], ['FL', 0], 
        ['FL', 2], ['AR', 2], ['AR', 1], ['FL', 0], ['AR', 1], ['CA', 0], ['CA', 0], ['CA', 2]]
columns = ['State', 'Class']
df = pd.DataFrame(data=data, columns=columns)
df.groupby(['State', 'Class']).size()

The counts were plotted using the code below:

df.groupby(['State', 'Class']).size().to_frame('Size').unstack(level=-1).sort_values([('Size', 0)], ascending=False).plot(kind='bar', rot=45, figsize=(8,8), fontsize=10)
plt.legend(['Class 0', 'Class 1', 'Class 2'], prop ={'size' : 15})

Similarly, percentages (state level, class level and overall) need to be plotted as well. Here's what I'm looking for:

  • Graph 1 : State on x axis and State level percentage of 'Size' on y axis with bars colored for each class (i.e., 3 bars per state)
  • Graph 2 : Class on x axis and Class level percentage of 'Size' on y axis with bars colored for each state (i.e., 51 bars per class)
  • Graph 3 : same as Graph 1 with Overall percentage of 'Size' on y axis.

Would you please help with how it can be done with an elegant approach ?

Below is my current solution. Need to plot StatePercent, ClassPercent and OverallPercent (Pic#3). My wish is to generate the 3 graphs without creating columns, like how it's done for 'Size' (Pic#2). (In the actual data frame, I have 100s of other columns; so looking for options not to create unnecessary columns if possible. These three columns are needed only for the stats, not for any of the processing)

Pic#1 - the dataframe Pic#1 - the dataframe

Pic#2 - Plotting 'Size' without creating a column in df Pic#2 - Plotting 'Size' without creating a column in df

Pic#3 - Need to plot StatePercent, ClassPercent and OverallPercent values as calculated below Pic#3 - Need to plot StatePercent, ClassPercent and OverallPercent values

CodePudding user response:

I assume your terms are defined like these.

  1. State level percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for the class C and all states)
  2. Class level percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for the state S and all classes)
  3. Overall percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for all states and all classes)

You can use the apply() and sum() like the following example.

df0 = df.groupby(['State', 'Class']).size().to_frame('Size')
df1 = df0.unstack(level=1).apply(lambda row: 100*row/sum(row), axis=1)
df2 = df0.unstack(level=0).apply(lambda row: 100*row/sum(row), axis=1)
df3 = 100*df0.unstack(level=1)/df0.sum().sum()

You can plot df1, df2 and df3 for case 1, 2 and 3 respectively.

  • Related