Pandas pivot_table - How to make a MultiIndex from a mix of column values and column names?-CodePudding

I'm relatively new to Pandas. I have a DataFrame in the form:

         A         B       C            D         E
0        1       1.1       a      23.7853   18.2647
1        1       1.2       a      23.7118   17.2387
2        1       1.1       b      24.1873   17.3874
3        1       1.2       b      23.1873   18.1748
4        2       1.1       a      24.1872   18.1847
...      ...     ...       ...     ...       ...

I would like to pivot it to have a three-level MultiIndex constructed from the values in columns A and B and the column headers ["D", "E"]. I also want to use the values from B as the new column headers and the data in columns D and E for the values. All values are one-to-one (with some NaNs). If I understand correctly, I need to use pivot_table() instead of just pivot() because of the MultiIndex. Ultimately I want a table that looks like:

B                      1.1       1.2  ...
A    C  col-name
1    a         D   23.7853   23.7118  ...
               E   18.2647   17.2387  ...
     b         D   24.1873   23.1873  ...
               E   17.3874   18.1748  ...
2    a         D   24.1872   23.1987  ...
               E   18.1847   19.2387  ...
...  ...     ...     ...       ...    ...

I'm pretty sure the answer is to use some command like

pd.pivot_table(df, columns=["B"], values=["D","E"], index=["A","C","???"])

I'm unsure what to put in the "values" and "index" arguments to get the right behavior.

If I can't do this with a single pivot_table command, do I need to construct my Multi-Index ahead of time? Then what?

Thanks!

CodePudding user response：

Create a multiindex with columns A, C, B then use stack unstack to reshape the dataframe

df.set_index(['A', 'C', 'B']).stack().unstack(-2)

B          1.1      1.2
A C                    
1 a D  23.7853  23.7118
    E  18.2647  17.2387
  b D  24.1873  23.1873
    E  17.3874  18.1748
2 a D  24.1872      NaN
    E  18.1847      NaN

CodePudding user response：

You can use pd.pivot_table() together with .stack(), as follows:

(pd.pivot_table(df, index=['A', 'C'], columns='B', values=["D","E"])
   .rename_axis(columns=['col_name', 'B'])         # set axis name for ["D","E"] 
   .stack(level=0)
)

Result:

B                 1.1      1.2
A C col_name                  
1 a D         23.7853  23.7118
    E         18.2647  17.2387
  b D         24.1873  23.1873
    E         17.3874  18.1748
2 a D         24.1872      NaN
    E         18.1847      NaN