i have a datafarame where i want to filter using pd.CategoricalDtype() and display the result in a bar chart using px.bar.
before the last update of pandas it was working perfectly but with the latest update it crash the chart and display the below error:
Traceback (most recent call last): File "", line 1, in File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_chart_types.py", line 373, in bar return make_figure( File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 2003, in make_figure groups, orders = get_groups_and_orders(args, grouper) File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 1978, in get_groups_and_orders groups = { File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 1979, in sf: grouped.get_group(s if len(s) > 1 else s[0]) File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 811, in get_group raise KeyError(name) KeyError: 'C'
code:
# Code outside px.bar
old_df2 = pd.DataFrame({'name': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C'],
'id1': [18, 22, 19, 14, 14, 11, 20, 28],
'id2': [5, 7, 7, 9, 12, 9, 9, 4],
'id3': [11, 8, 10, 6, 6, 7, 9, 12]})
new_df = old_df2.groupby([pd.CategoricalDtype(old_df2.name),'id2'])['id3'].count().fillna(0)
# Transforms count from series to data frame
new_df = new_df.to_frame()
# rowname to index
new_df.reset_index(inplace=True)
new_df = new_df[new_df["level_0"].isin(["A","B"])]
new_df .rename(columns={'level_0': 'name'}, inplace=True)
# Not working here the error
fig_bar = px.bar(new_df.loc[::-1], x="id2", y="id3", color = "name", barmode="group")
# Working version identical data
new_df_list = new_df.to_dict("records")
unlinked_df = pd.DataFrame(new_df_list )
how to fix the error ?
CodePudding user response:
I think you can convert column to Categorical
if need default behavior - categories are inferred from the data and Categories are unordered:
new_df = old_df2.groupby([pd.Categorical(old_df2.name),'id2'])['id3'].count().fillna(0)
If need CategoricalDtype
pass categories
by unique values of old_df2.name
:
from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=old_df2.name.unique())
new_df = old_df2.groupby([old_df2.name.astype(cat_type),'id2'])['id3'].count().fillna(0)
Also change iloc
from loc
:
fig_bar = px.bar(new_df.iloc[::-1], x="id2", y="id3", color = "name", barmode="group")
EDIT: I do some research and problem is if filtering by category column missing categories are not removed. You can try cat.remove_unused_categories
after isin
:
old_df2 = pd.DataFrame({'name': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C'],
'id1': [18, 22, 19, 14, 14, 11, 20, 28],
'id2': [5, 7, 7, 9, 12, 9, 9, 4],
'id3': [11, 8, 10, 6, 6, 7, 9, 12]})
from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=old_df2.name.unique())
new_df = old_df2.groupby([old_df2.name.astype(cat_type),'id2'])['id3'].count().fillna(0)
# rowname to index
new_df = new_df.reset_index()
new_df = new_df[new_df["name"].isin(["A","B"])]
print (new_df['name'])
# 0 A
# 1 A
# 2 A
# 3 A
# 4 A
# 5 B
# 6 B
# 7 B
# 8 B
# 9 B
# Name: name, dtype: category
# Categories (3, object): ['A', 'B', 'C']
new_df['name'] = new_df['name'].cat.remove_unused_categories()
print (new_df['name'])
# 0 A
# 1 A
# 2 A
# 3 A
# 4 A
# 5 B
# 6 B
# 7 B
# 8 B
# 9 B
# Name: name, dtype: category
# Categories (2, object): ['A', 'B']