I have data of States and Classes as below. I am trying to plot the total, and different percentages using matplotlib.
data = [['FL', 2], ['AR', 0], ['CA', 0], ['CA', 1], ['AR', 1], ['FL', 0], ['CA', 0], ['CA', 1], ['AR', 2], ['FL', 1],
['AR', 0], ['FL', 2], ['CA', 1], ['FL', 1], ['AR', 1], ['AR', 2], ['AR', 1], ['FL', 2], ['CA', 2], ['FL', 0],
['FL', 2], ['AR', 2], ['AR', 1], ['FL', 0], ['AR', 1], ['CA', 0], ['CA', 0], ['CA', 2]]
columns = ['State', 'Class']
df = pd.DataFrame(data=data, columns=columns)
df.groupby(['State', 'Class']).size()
The counts were plotted using the code below:
df.groupby(['State', 'Class']).size().to_frame('Size').unstack(level=-1).sort_values([('Size', 0)], ascending=False).plot(kind='bar', rot=45, figsize=(8,8), fontsize=10)
plt.legend(['Class 0', 'Class 1', 'Class 2'], prop ={'size' : 15})
Similarly, percentages (state level, class level and overall) need to be plotted as well. Here's what I'm looking for:
- Graph 1 : State on x axis and State level percentage of 'Size' on y axis with bars colored for each class (i.e., 3 bars per state)
- Graph 2 : Class on x axis and Class level percentage of 'Size' on y axis with bars colored for each state (i.e., 51 bars per class)
- Graph 3 : same as Graph 1 with Overall percentage of 'Size' on y axis.
Would you please help with how it can be done with an elegant approach ?
Below is my current solution. Need to plot StatePercent, ClassPercent and OverallPercent (Pic#3). My wish is to generate the 3 graphs without creating columns, like how it's done for 'Size' (Pic#2). (In the actual data frame, I have 100s of other columns; so looking for options not to create unnecessary columns if possible. These three columns are needed only for the stats, not for any of the processing)
Pic#2 - Plotting 'Size' without creating a column in df
Pic#3 - Need to plot StatePercent, ClassPercent and OverallPercent values as calculated below
CodePudding user response:
I assume your terms are defined like these.
- State level percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for the class C and all states)
- Class level percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for the state S and all classes)
- Overall percentage of a state S and a class C = 100 * (count of records for the state S and the class C) / (count of records for all states and all classes)
You can use the apply()
and sum()
like the following example.
df0 = df.groupby(['State', 'Class']).size().to_frame('Size')
df1 = df0.unstack(level=1).apply(lambda row: 100*row/sum(row), axis=1)
df2 = df0.unstack(level=0).apply(lambda row: 100*row/sum(row), axis=1)
df3 = 100*df0.unstack(level=1)/df0.sum().sum()
You can plot df1, df2 and df3 for case 1, 2 and 3 respectively.