I have a dataframe with two indexes, group and class. I have a dictionary containing additional levels that need to be added in to both those indexes. Specifically I want to add E to the group index. And i want to ensure all g1, g2, and g3 are present in the class index, per group (so add g3 to group A, g1 to group B, g2 and g3 to group C, and g1 and g3 to group D and g1, g2 and g3 to group E. I want to fill where appropriate the total column with zeros
the original dataframe is here
df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
'class': ['g1','g2','g2','g3','g1','g2'],
'total' : [3,14,12,11,21,9]})
the dictionary (and mapped df) containing all required categories is here
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
and the expected output is here
expectedOutput = pd.DataFrame(data={'group' : ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'],
'class': ['g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3'],
'total' : [3,14,0, 0,12,11,21,0,0,0,9,0, 0,0,0]})
I am having a trouble maintaining the duplicated values when reindexing, but I need to keep them all. Any advice is welcome, thanks a lot
CodePudding user response:
Solution with MultiIndex
- created from dict
by MultiIndex.from_product
with DataFrame.reindex
:
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
mux = pd.MultiIndex.from_product(dic.values(), names=dic)
df = df.set_index(list(dic)).reindex(mux, fill_value=0).reset_index()
print (df)
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0
Or with left join by DataFrame
created by itertools.product
:
from itertools import product
dicDf = pd.DataFrame(product(*dic.values()), columns=dic)
df = dicDf.merge(df, how='left').fillna({'total':0})
print (df)
group class total
0 A g1 3.0
1 A g2 14.0
2 A g3 0.0
3 B g1 0.0
4 B g2 12.0
5 B g3 11.0
6 C g1 21.0
7 C g2 0.0
8 C g3 0.0
9 D g1 0.0
10 D g2 9.0
11 D g3 0.0
12 E g1 0.0
13 E g2 0.0
14 E g3 0.0
CodePudding user response:
You can use the nice pyjanitor module and its complete
method:
# pip install pyjanitor
import janitor as jn
(df.complete({'group': list(df['group'].unique()) ['D', 'E']}, 'class')
.fillna(0, downcast='infer')
)
output:
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0