Given the following dataframe containing a matrix in tabular format:
values = list(range(5))
var1 = ['01', '02', '03', '0', '0']
var2 = ['a', 'b', 'c', 'd', 'e']
var3 = ['01', '02', '03', '0', '0']
var4 = ['a', 'b', 'c', 'd', 'e']
var5 = ['S1', 'S1','S1', 'S3', 'S2']
var6 = ['P1', 'P1','P1', 'P3', 'P2']
df = pd.DataFrame({'var1': var1,
'var2': var2,
'var3': var3,
'var4': var4,
'var5': var5,
'var6': var6,
'value': values})
And the following imposed order for var5 and var6:
var5_order = ['S1', 'S2', 'S3', 'S4']
var6_order = ['P1', 'P2', 'P3', 'P4']
How to pivot the dataframe in a way that var6, var1, and var2 (in this order) define a row multiindex and var5, var3, and var4 (in this order) define a column multiindex? In addition, how to impose var5_order and var6_order in the pivoted dataframe?
CodePudding user response:
Yes, you can do pd.Categorical
with ordered=True
:
df['var5'] = pd.Categorical(df['var5'], categories=var5_order, ordered=True)
df['var6'] = pd.Categorical(df['var6'], categories=var6_order, ordered=True)
df.pivot_table(index=['var6','var1','var2'],
columns=['var5','var3','var4'],
values='value')
Output:
var5 S1 S2 S3
var3 01 02 03 0 0
var4 a b c e d
var6 var1 var2
P1 01 a 0.0 NaN NaN NaN NaN
02 b NaN 1.0 NaN NaN NaN
03 c NaN NaN 2.0 NaN NaN
P2 0 e NaN NaN NaN 4.0 NaN
P3 0 d NaN NaN NaN NaN 3.0