Home > Software engineering >  How can I split my summarized Python table by unique value that is part of Index into a graph, chart
How can I split my summarized Python table by unique value that is part of Index into a graph, chart

Time:06-07

I have been trying to find a method that works for what I want, but haven't had much luck. I have a large dataframe that contains information about different type of documents' statuses.

I started with something of this format:

import pandas as pd
import numpy as np
import random

list = ['Up to Date', 'Expiring Soon', 'Expired']

y = pd.DataFrame(np.random.choice(list, size=(60,6)))
y

I created several data tables grouping the information by:

a = z.groupby('Primary staff')['A Status'].value_counts()

 b = z.groupby('Primary staff')['B Status'].value_counts() etc..

I then concatenated all of the created tables together which resulted in an acceptable table which looks something like:

Primary staff                      | A | B | C | D | E |
Bob                       Expired  | 3 | 2 | 0 | 0 | 1 |
                     Expiring Soon | 10| 9 | 6 | 7 | 2 |
                        Up to Date | 45| 39| 61| 64| 69|
Sally                     Expired  | 1 | 7 | 4 | 0 | 3 |
                     Expiring Soon | 9 | 13| 6 | 2 | 1 |
                        Up to Date | 35| 61| 28| 33| 70| etc.....

I would like to separate these by Primary staff, but in this table the Primary staff is no longer a column it is part of a multi-index. Ideally this would end up as a grouped bar chart or something of the sort for each primary staff keeping the document names on the x-axis and the counts on the y-axis. Any suggestions? Either a table where staff have their own sheet or a chart like the following (pardon my poor Microsoft Paint skills):

enter image description here

This would need to be made for each primary staff and include things like a legend, y axis, etc.

CodePudding user response:

I took your "acceptable table" and did a pandas melt.

import pandas as pd
import hvplot.pandas  # noqa

pd.options.plotting.backend = 'holoviews'

df = pd.DataFrame({'Primary staff': {0: 'Bob',
  1: 'Bob',
  2: 'Bob',
  3: 'Sally',
  4: 'Sally',
  5: 'Sally'},
 'Status': {0: '      Expired ',
  1: ' Expiring Soon',
  2: '    Up to Date',
  3: '      Expired ',
  4: ' Expiring Soon',
  5: '    Up to Date'},
 'A': {0: 3, 1: 10, 2: 45, 3: 1, 4: 9, 5: 35},
 'B': {0: 2, 1: 9, 2: 39, 3: 7, 4: 13, 5: 61},
 'C': {0: 0, 1: 6, 2: 61, 3: 4, 4: 6, 5: 28},
 'D': {0: 0, 1: 7, 2: 64, 3: 0, 4: 2, 5: 33},
 'E': {0: 1, 1: 2, 2: 69, 3: 3, 4: 1, 5: 70}})

Then, using pd.melt:

new_df = pd.melt(
df, id_vars=["Primary staff", "Status"], value_vars=["A", "B", "C", "D", "E"])

Finally, use hvplot

new_df.hvplot.bar(
x="Status", stacked=False, by=["Primary staff","variable"], subplots=True).cols(2)

That returns a bar chart per Primary staff and Status value with the advantage of having interactivity with the plot as well. enter image description here

  • Related