Home > Enterprise >  reindex multi level index with missing categories
reindex multi level index with missing categories

Time:10-15

I have a dataframe with two indexes, group and class. I have a dictionary containing additional levels that need to be added in to both those indexes. Specifically I want to add E to the group index. And i want to ensure all g1, g2, and g3 are present in the class index, per group (so add g3 to group A, g1 to group B, g2 and g3 to group C, and g1 and g3 to group D and g1, g2 and g3 to group E. I want to fill where appropriate the total column with zeros

the original dataframe is here

df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
                        'class': ['g1','g2','g2','g3','g1','g2'],
                        'total' : [3,14,12,11,21,9]})

the dictionary (and mapped df) containing all required categories is here

dic = {'group':['A','B','C','D','E'],
       'class' : ['g1','g2','g3']}

and the expected output is here

expectedOutput = pd.DataFrame(data={'group' : ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'],
                        'class': ['g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3'],
                        'total' : [3,14,0, 0,12,11,21,0,0,0,9,0, 0,0,0]})

I am having a trouble maintaining the duplicated values when reindexing, but I need to keep them all. Any advice is welcome, thanks a lot

CodePudding user response:

Solution with MultiIndex - created from dict by MultiIndex.from_product with DataFrame.reindex:

dic = {'group':['A','B','C','D','E'],
       'class' : ['g1','g2','g3']}

mux = pd.MultiIndex.from_product(dic.values(), names=dic)

df = df.set_index(list(dic)).reindex(mux, fill_value=0).reset_index()
print (df)
   group class  total
0      A    g1      3
1      A    g2     14
2      A    g3      0
3      B    g1      0
4      B    g2     12
5      B    g3     11
6      C    g1     21
7      C    g2      0
8      C    g3      0
9      D    g1      0
10     D    g2      9
11     D    g3      0
12     E    g1      0
13     E    g2      0
14     E    g3      0

Or with left join by DataFrame created by itertools.product:

from  itertools import product

dicDf = pd.DataFrame(product(*dic.values()), columns=dic)

df = dicDf.merge(df, how='left').fillna({'total':0})
print (df)
   group class  total
0      A    g1    3.0
1      A    g2   14.0
2      A    g3    0.0
3      B    g1    0.0
4      B    g2   12.0
5      B    g3   11.0
6      C    g1   21.0
7      C    g2    0.0
8      C    g3    0.0
9      D    g1    0.0
10     D    g2    9.0
11     D    g3    0.0
12     E    g1    0.0
13     E    g2    0.0
14     E    g3    0.0

CodePudding user response:

You can use the nice pyjanitor module and its complete method:

# pip install pyjanitor
import janitor as jn 
(df.complete({'group': list(df['group'].unique()) ['D', 'E']}, 'class')
   .fillna(0, downcast='infer')
)

output:

   group class  total
0      A    g1      3
1      A    g2     14
2      A    g3      0
3      B    g1      0
4      B    g2     12
5      B    g3     11
6      C    g1     21
7      C    g2      0
8      C    g3      0
9      D    g1      0
10     D    g2      9
11     D    g3      0
12     E    g1      0
13     E    g2      0
14     E    g3      0
  • Related