I want to make a bar chart with Matplotlib or Seaborn that will represent number of unique values of each month:
import pandas as pd
date_list = ['jan', 'jan', 'jan', 'jan', 'feb', 'feb', 'feb', 'feb', 'mar', 'mar', 'mar', 'mar']
value_list = ['high', 'high', 'high', 'high', 'medium', 'high', 'low', 'low', 'low', 'low', 'low', 'high']
print(len(date_list))
print(len(value_list))
df = pd.DataFrame({'date': date_list,
'value': value_list})
print(df)
date value
0 jan high
1 jan high
2 jan high
3 jan high
4 feb medium
5 feb high
6 feb low
7 feb low
8 mar low
9 mar low
10 mar low
11 mar high
Unique values:
jan: 1
feb: 3
mar: 2
Graph should be ordered by months.
I know that I can do :
sns.countplot(x = 'date', hue = 'value', data = df)
But this gives me count of every category in value_list
CodePudding user response:
You need to compute first the groups yourself. Then feed to
Ensuring order of the months (and presence of all):
from calendar import month_abbr # use month_name for full names
# or use a hand defined list
order = [m.lower()for m in month_abbr][1:]
# ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
df2 = (df.groupby('date', sort=False)
.agg(**{'unique values': ('value', 'nunique')})
.reindex(order)
.reset_index()
)
sns.barplot(data=df2, x='date', y='unique values')
output:
CodePudding user response:
EDIT: Solution for plot multiple DataFrames in same x-axis:
df2 = pd.DataFrame({'date': date_list[:5],
'value': value_list[:5]})
print(df2)
dfs = [df1, df2]
df = (pd.concat(dfs, keys=range(len(dfs))).rename_axis(('no', 'i'))
.reset_index()
.drop_duplicates(['no', 'date','value'])_
print (df)
no i date value
0 0 0 jan high
4 0 4 feb medium
5 0 5 feb high
6 0 6 feb low
8 0 8 mar low
11 0 11 mar high
12 1 0 jan high
16 1 4 feb medium
# df1.plot.bar(x='date', y='count nunique')
sns.countplot(x = 'date', hue = 'no', data = df)