Home > Blockchain >  How to make a stacked barplot for percentage of three classes per year?
How to make a stacked barplot for percentage of three classes per year?

Time:10-11

I need to make a stacked barplot using this dataset(head):

data = {'model': ['A1', 'A6', 'A1', 'A4', 'A3'],
        'year': [2017, 2016, 2016, 2017, 2019],
        'price': [12500, 16500, 11000, 16800, 17300],
        'transmission': ['Manual', 'Automatic', 'Manual', 'Automatic', 'Manual'],
        'mileage': [15735, 36203, 29946, 25952, 1998],
        'fuelType': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol'],
        'tax': [150, 20, 30, 145, 145],
        'mpg': [55.4, 64.2, 55.4, 67.3, 49.6],
        'engineSize': [1.4, 2.0, 1.4, 2.0, 1.0]}

df = pd.DataFrame(data)

  model  year  price transmission  mileage fuelType  tax   mpg  engineSize
0    A1  2017  12500       Manual    15735   Petrol  150  55.4         1.4
1    A6  2016  16500    Automatic    36203   Diesel   20  64.2         2.0
2    A1  2016  11000       Manual    29946   Petrol   30  55.4         1.4
3    A4  2017  16800    Automatic    25952   Diesel  145  67.3         2.0
4    A3  2019  17300       Manual     1998   Petrol  145  49.6         1.0

I would like the years (1997-2021) on x-axis and numbers ranging from 0 to 100 on the y-axis representing percentages. Finally, I would like the three different fuelTypes to be shown in yearly proportions; Petrol, Diesel and Hybrid.

I've already done the following calculations to get my percentages, per fuelType, per year and now I need to put it on a graph:

fuel_percentage = round((my_data_frame.groupby(['year'])['fuelType'].value_counts()/my_data_frame.groupby('year')['fuelType'].count())*100, 2)

print(fuel_percentage)

Which gives me the following result:

year  fuelType
1997  Petrol      100.00
1998  Petrol      100.00
2002  Petrol      100.00
2003  Diesel       66.67
      Petrol       33.33
2004  Petrol       80.00
      Diesel       20.00
2005  Petrol       71.43
      Diesel       28.57
2006  Petrol       66.67
      Diesel       33.33
2007  Petrol       56.25
      Diesel       43.75
2008  Diesel       66.67
      Petrol       33.33
etc...

My main worry is that since the object is not a dataframe I won't be able to use it to make a plot.

Here is an example of the kind of plot I would like (replace players with fuelTypes and y-axis with percentages): enter image description here

Thanks for the help!

... edit ... enter image description here

CodePudding user response:

  • Tested in python 3.8.11, pandas 1.3.3, matplotlib 3.4.3

.groupby & .unstack

  • enter image description here

    • Remove xticks=dfc.index to have the plotting API have more values on the x-axis.
    # plot area
    ax = dfc.plot(kind='area', ylabel='Percent(%)', rot=0, figsize=(10, 4), xticks=dfc.index)
    

    enter image description here

  • Related