Home > Blockchain >  Plot barchart of number of unique values in each Pandas group
Plot barchart of number of unique values in each Pandas group

Time:02-14

I want to make a bar chart with Matplotlib or Seaborn that will represent number of unique values of each month:

import pandas as pd

date_list = ['jan', 'jan', 'jan', 'jan', 'feb', 'feb', 'feb', 'feb', 'mar', 'mar', 'mar', 'mar']
value_list = ['high', 'high', 'high', 'high', 'medium', 'high', 'low', 'low', 'low', 'low', 'low', 'high']

print(len(date_list))
print(len(value_list))

df = pd.DataFrame({'date': date_list,
                  'value': value_list})
                  
print(df)

    date   value
0   jan    high
1   jan    high
2   jan    high
3   jan    high
4   feb  medium
5   feb    high
6   feb     low
7   feb     low
8   mar     low
9   mar     low
10  mar     low
11  mar    high



Unique values:

jan: 1
feb: 3
mar: 2

Graph should be ordered by months.

I know that I can do :

sns.countplot(x = 'date', hue = 'value', data = df)

But this gives me count of every category in value_list

CodePudding user response:

You need to compute first the groups yourself. Then feed to enter image description here

Ensuring order of the months (and presence of all):

from calendar import month_abbr  # use month_name for full names

# or use a hand defined list
order = [m.lower()for m in month_abbr][1:]
# ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']

df2 = (df.groupby('date', sort=False)
         .agg(**{'unique values': ('value', 'nunique')})
         .reindex(order)
         .reset_index()
      )
sns.barplot(data=df2, x='date', y='unique values')

output:

enter image description here

CodePudding user response:

Use img

Or remove duplicates by img

EDIT: Solution for plot multiple DataFrames in same x-axis:

df2 = pd.DataFrame({'date': date_list[:5],
                  'value': value_list[:5]})
                  
print(df2)

dfs = [df1, df2]

df = (pd.concat(dfs, keys=range(len(dfs))).rename_axis(('no', 'i'))
        .reset_index()
        .drop_duplicates(['no', 'date','value'])_

print (df)
    no   i date   value
0    0   0  jan    high
4    0   4  feb  medium
5    0   5  feb    high
6    0   6  feb     low
8    0   8  mar     low
11   0  11  mar    high
12   1   0  jan    high
16   1   4  feb  medium

# df1.plot.bar(x='date', y='count nunique')
sns.countplot(x = 'date', hue = 'no', data = df)
  • Related