Home > Blockchain >  Stacking bar plot using pandas
Stacking bar plot using pandas

Time:04-23

I want to represent my data in the form of a bar plot as shown on my expected output. enter image description here

time,date,category
0,2002-05-01,2
1,2002-05-02,0
2,2002-05-03,0
3,2002-05-04,0
4,2002-05-05,0
5,2002-05-06,0
6,2002-05-07,0
7,2002-05-08,2
8,2002-05-09,2
9,2002-05-10,0
10,2002-05-11,2
11,2002-05-12,0
12,2002-05-13,0
13,2002-05-14,2
14,2002-05-15,2
15,2002-05-16,2
16,2002-05-17,2
17,2002-05-18,2
18,2002-05-19,0
19,2002-05-20,0
20,2002-05-21,1
21,2002-05-22,2
22,2002-05-23,0
23,2002-05-24,1
24,2002-05-25,0
25,2002-05-26,0
26,2002-05-27,0
27,2002-05-28,0
28,2002-05-29,1
29,2002-05-30,0

import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')
daily_category = df[['date','category']]
daily_category['weekday'] = pd.to_datetime(daily_category['date']).dt.day_name()
daily_category_plot = daily_category[['weekday','category']]

daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None)
plt.show()

However, I get the below error

Traceback (most recent call last): File "day_plot.py", line 10, in daily_category_plot[['category']].groupby('weekday').count().plot(kind='bar', legend=None) File "/home/..../.local/lib/python3.6/site-packages/pandas/core/frame.py", line 6525, in groupby dropna=dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 533, in init dropna=self.dropna, File "/home/..../.local/lib/python3.6/site-packages/pandas/core/groupby/grouper.py", line 786, in get_grouper raise KeyError(gpr) KeyError: 'weekday'

********** A further example below where I manually extract data below returns almost the expected output except that the days are represented as numbers instead of weekday names. ***********

Day,category1,category2,category3
Sunday,0,0,4
Monday,0,0,4
Tuesday,1,1,2
Wednesday,1,4,0
Thursday,0,2,3
Friday,1,1,2
Saturday,0,2,2

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

df = pd.read_csv('df.csv')

ax = df.plot.bar(stacked=True, color=['green', 'red', 'blue'])
ax.set_xticklabels(labels=df.index, rotation=70, rotation_mode="anchor", ha="right")
ax.set_xlabel('')
ax.set_ylabel('Number of days')
plt.show()

CodePudding user response:

import pandas as pd
import matplotlib.pyplot as plt

d = """0,2002-05-01,2  1,2002-05-02,0  2,2002-05-03,0  3,2002-05-04,0  4,2002-05-05,0  5,2002-05-06,0  6,2002-05-07,0  7,2002-05-08,2  8,2002-05-09,2  9,2002-05-10,0  10,2002-05-11,2  11,2002-05-12,0  12,2002-05-13,0  13,2002-05-14,2  14,2002-05-15,2  15,2002-05-16,2  16,2002-05-17,2  17,2002-05-18,2  18,2002-05-19,0  19,2002-05-20,0  20,2002-05-21,1  21,2002-05-22,2  22,2002-05-23,0  23,2002-05-24,1  24,2002-05-25,0  25,2002-05-26,0  26,2002-05-27,0  27,2002-05-28,0  28,2002-05-29,1  29,2002-05-30,0"""
df = pd.DataFrame([v.split(',') for v in d.split('  ')], columns=['time', 'date', 'category'])
df.time, df.category = df.time.astype(int), df.category.astype(int)

data = df.copy()
data['weekday'] = pd.to_datetime(data['date']).dt.day_name()
data.drop(columns=['time', 'date'], inplace=True)

weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
categories = sorted(list(set(df.category)))
counts = pd.DataFrame(0, index=weekdays, columns=categories)
for weekday, category in zip(data.weekday, data.category):
    counts.loc[weekday, category]  = 1

counts.plot.bar(stacked=True);

enter image description here

CodePudding user response:

This solution uses groupby on to columns and transforms the returned Dataframe using pivot. This can be plotted by plot.bar() but has the wrong labels. Therefor the index is changed.

ans = (df.groupby(["weekday", "category"]) 
         .size()
         .reset_index(name="sum")
         .pivot(index='weekday', columns='category', values='sum')
      )
ans.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
ans.plot.bar(stacked=True)

enter image description here

  • Related