Home > Net >  How to structure a pandas dataframe for plotting nested pie/donut charts?
How to structure a pandas dataframe for plotting nested pie/donut charts?

Time:04-12

This is similar but it is dated and the code doesn't work with the current version of Pandas: enter image description here

I'm trying to create a chart that looks like this but with labels. I understand labels at every level will be absurd so I'm looking for a way to say anything under a particular count will be grouped as "Other":

Full code to transform a raw DataFrame -> nested donuts (with a more manageable sample for demonstration):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

WEDGE_SIZE = 0.5
LABEL_THRESHOLD = 1

df = pd.DataFrame({'one': list('AAAAAAAAABBBBBBBCCCC'), 'two': list('DDDDDDEEEFFFGGGGHHII'), 'three': list('JJJKKLLMMMMNNNNNNNNN'), 'four': list('OOPPPPQQRSTTTUUUUVVV'), 'five': list('WWWXXXXXXYYYYYYZZZZZ')}).cumsum(1)

fig, ax = plt.subplots()

# generate MultiIndex of counts with one feature per level
counts = df.value_counts()

# define primary colormaps (cycle if levels > 6)
cmaps = np.resize(['Blues_r', 'Greens_r', 'Oranges_r', 'Purples_r', 'Reds_r', 'Greys_r'],
                  counts.index.get_level_values(0).size)

for level in range(len(counts.index.names)):
    # compute grouped sums up to current level
    wedges = counts.groupby(level=list(range(level 1))).sum()

    # extract annotation labels from MultiIndex
    labels = wedges.index.get_level_values(level)

    # generate color shades per group
    index = [(i,) if level == 0 else i for i in wedges.index.tolist()] # standardize Index vs MultiIndex
    g0 = pd.DataFrame.from_records(index).groupby(0)
    maps = g0.ngroup()
    shades = g0.cumcount() / g0.size().max()
    colors = [plt.get_cmap(cmaps[m])(s) for m, s in zip(maps, shades)]
    
    # plot colorized/labeled donut layer
    ax.pie(x=wedges,
           radius=1   (level * WEDGE_SIZE),
           colors=colors,
           labels=np.where(wedges >= LABEL_THRESHOLD, labels, ''), # unlabel if under threshold
           rotatelabels=True,
           labeldistance=1.1 - 1.4/(level 3.5), # put labels inside wedge instead of outside (requires manual tweaking)
           wedgeprops=dict(width=WEDGE_SIZE, linewidth=0, alpha=0.33))

Note that your sample data maps to a huge number of wedges (outer level = 199 species), so aggregating smaller values as "other" won't really work. The wedges are all basically the same small size, so I'm not sure how this full sample could be reasonably labeled.

Full sample on the left, smaller subset on the right:


For reference, these are the outputs from df -> df.value_counts -> groupby.sum.

Original df:

>>> df = pd.DataFrame({'one': list('AAAAAAAAABBBBBBBCCCC'), 'two': list('DDDDDDEEEFFFGGGGHHII'), 'three': list('JJJKKLLMMMMNNNNNNNNN'), 'four': list('OOPPPPQQRSTTTUUUUVVV'), 'five': list('WWWXXXXXXYYYYYYZZZZZ')}).cumsum(1)
>>> df

   one two three  four   five
0    A  AD   ADJ  ADJO  ADJOW
1    A  AD   ADJ  ADJO  ADJOW
2    A  AD   ADJ  ADJP  ADJPW
3    A  AD   ADK  ADKP  ADKPX
4    A  AD   ADK  ADKP  ADKPX
5    A  AD   ADL  ADLP  ADLPX
6    A  AE   AEL  AELQ  AELQX
7    A  AE   AEM  AEMQ  AEMQX
8    A  AE   AEM  AEMR  AEMRX
9    B  BF   BFM  BFMS  BFMSY
10   B  BF   BFM  BFMT  BFMTY
11   B  BF   BFN  BFNT  BFNTY
12   B  BG   BGN  BGNT  BGNTY
13   B  BG   BGN  BGNU  BGNUY
14   B  BG   BGN  BGNU  BGNUY
15   B  BG   BGN  BGNU  BGNUZ
16   C  CH   CHN  CHNU  CHNUZ
17   C  CH   CHN  CHNV  CHNVZ
18   C  CI   CIN  CINV  CINVZ
19   C  CI   CIN  CINV  CINVZ

MultiIndex from df.value_counts:

>>> counts = df.value_counts()
>>> counts

one  two  three  four  five 
A    AD   ADJ    ADJO  ADJOW    2
          ADK    ADKP  ADKPX    2
B    BG   BGN    BGNU  BGNUY    2
C    CI   CIN    CINV  CINVZ    2
A    AD   ADJ    ADJP  ADJPW    1
          ADL    ADLP  ADLPX    1
     AE   AEL    AELQ  AELQX    1
          AEM    AEMQ  AEMQX    1
                 AEMR  AEMRX    1
B    BF   BFM    BFMS  BFMSY    1
                 BFMT  BFMTY    1
          BFN    BFNT  BFNTY    1
     BG   BGN    BGNT  BGNTY    1
                 BGNU  BGNUZ    1
C    CH   CHN    CHNU  CHNUZ    1
                 CHNV  CHNVZ    1

Wedge totals from groupby.sum:

>>> counts.groupby(level=[0]).sum()

one
A    9
B    7
C    4
>>> counts.groupby(level=[0, 1]).sum()

one  two
A    AD     6
     AE     3
B    BF     3
     BG     4
C    CH     2
     CI     2
>>> counts.groupby(level=[0, 1, 2]).sum()

one  two  three
A    AD   ADJ      3
          ADK      2
          ADL      1
     AE   AEL      1
          AEM      2
B    BF   BFM      2
          BFN      1
     BG   BGN      4
C    CH   CHN      2
     CI   CIN      2
>>> counts.groupby(level=[0, 1, 2, 3]).sum()

one  two  three  four
A    AD   ADJ    ADJO    2
                 ADJP    1
          ADK    ADKP    2
          ADL    ADLP    1
     AE   AEL    AELQ    1
          AEM    AEMQ    1
                 AEMR    1
B    BF   BFM    BFMS    1
                 BFMT    1
          BFN    BFNT    1
     BG   BGN    BGNT    1
                 BGNU    3
C    CH   CHN    CHNU    1
                 CHNV    1
     CI   CIN    CINV    2
  • Related