In pandas, how to pivot a dataframe on a categorical series with missing categories?-CodePudding

I have a pandas dataframe with a categorical series that has missing categories.

In the example shown below, group has the categories "a", "b", and "c", but there are no cases of "c" in the dataframe.

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")

The resulting pivoted dataframe has columns a and b. I expected a c column containing all missing value as well.

      value      
group     a     b
id               
111     1.0   9.0
222     4.0   NaN
333     NaN  16.0

How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?

CodePudding user response：

pd.pivot_table has a dropna argument which dictates dropping or not value columns full of NaNs.

Try setting it to False:

import pandas as pd
dfr = pd.DataFrame({
    "id": ["111", "222", "111", "333"], 
    "group": ["a", "a", "b", "b"], 
    "value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)

CodePudding user response：

You can reindex. This will work even if your value column is not numerical (unlike pivot_table):

output = (dfr.pivot(index="id", columns="group")
             .reindex(columns=pd.MultiIndex.from_product([["value"],
                                                          dfr["group"].cat.categories]
                                                         )
                      )
             )

>>> output
    value          
        a     b   c
id                 
111   1.0   9.0 NaN
222   4.0   NaN NaN
333   NaN  16.0 NaN