I have a pandas dataframe with a categorical series that has missing categories.
In the example shown below, group
has the categories "a"
, "b"
, and "c"
, but there are no cases of "c"
in the dataframe.
import pandas as pd
dfr = pd.DataFrame({
"id": ["111", "222", "111", "333"],
"group": ["a", "a", "b", "b"],
"value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
dfr.pivot(index="id", columns="group")
The resulting pivoted dataframe has columns a
and b
. I expected a c
column containing all missing value as well.
value
group a b
id
111 1.0 9.0
222 4.0 NaN
333 NaN 16.0
How can I pivot a dataframe on a categorical series to include columns with all categories, regardless of whether they were present in the original dataframe?
CodePudding user response:
pd.pivot_table
has a dropna
argument which dictates dropping or not value columns full of NaNs.
Try setting it to False
:
import pandas as pd
dfr = pd.DataFrame({
"id": ["111", "222", "111", "333"],
"group": ["a", "a", "b", "b"],
"value": [1, 4, 9, 16]})
dfr["group"] = pd.Categorical(dfr["group"], categories=["a", "b", "c"])
pd.pivot_table(dfr, index="id", columns="group", dropna=False)
CodePudding user response:
You can reindex
. This will work even if your value
column is not numerical (unlike pivot_table
):
output = (dfr.pivot(index="id", columns="group")
.reindex(columns=pd.MultiIndex.from_product([["value"],
dfr["group"].cat.categories]
)
)
)
>>> output
value
a b c
id
111 1.0 9.0 NaN
222 4.0 NaN NaN
333 NaN 16.0 NaN