Home > Software design >  Pandas latest update filtered Grouped by objects breaks px.bar
Pandas latest update filtered Grouped by objects breaks px.bar

Time:11-10

i have a datafarame where i want to filter using pd.CategoricalDtype() and display the result in a bar chart using px.bar.

before the last update of pandas it was working perfectly but with the latest update it crash the chart and display the below error:

Traceback (most recent call last): File "", line 1, in File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_chart_types.py", line 373, in bar return make_figure( File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 2003, in make_figure groups, orders = get_groups_and_orders(args, grouper) File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 1978, in get_groups_and_orders groups = { File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/plotly/express/_core.py", line 1979, in sf: grouped.get_group(s if len(s) > 1 else s[0]) File "/home/marco/python-wsl/project_folder/venv/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 811, in get_group raise KeyError(name) KeyError: 'C'

code:

# Code outside px.bar
old_df2 = pd.DataFrame({'name': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C'],
                       'id1': [18, 22, 19, 14, 14, 11, 20, 28],
                       'id2': [5, 7, 7, 9, 12, 9, 9, 4],
                       'id3': [11, 8, 10, 6, 6, 7, 9, 12]})


new_df = old_df2.groupby([pd.CategoricalDtype(old_df2.name),'id2'])['id3'].count().fillna(0)
    
# Transforms count from series to data frame
new_df = new_df.to_frame()

# rowname to index 
new_df.reset_index(inplace=True)

new_df = new_df[new_df["level_0"].isin(["A","B"])]

new_df .rename(columns={'level_0': 'name'}, inplace=True)

# Not working  here the error 
fig_bar = px.bar(new_df.loc[::-1], x="id2", y="id3", color = "name", barmode="group")

# Working version identical data
new_df_list = new_df.to_dict("records")

unlinked_df = pd.DataFrame(new_df_list )

how to fix the error ?

CodePudding user response:

I think you can convert column to Categorical if need default behavior - categories are inferred from the data and Categories are unordered:

new_df = old_df2.groupby([pd.Categorical(old_df2.name),'id2'])['id3'].count().fillna(0)

If need CategoricalDtype pass categories by unique values of old_df2.name:

from pandas.api.types import CategoricalDtype

cat_type = CategoricalDtype(categories=old_df2.name.unique())
new_df = old_df2.groupby([old_df2.name.astype(cat_type),'id2'])['id3'].count().fillna(0)

Also change iloc from loc:

fig_bar = px.bar(new_df.iloc[::-1], x="id2", y="id3", color = "name", barmode="group")

EDIT: I do some research and problem is if filtering by category column missing categories are not removed. You can try cat.remove_unused_categories after isin:

old_df2 = pd.DataFrame({'name': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'C'],
                       'id1': [18, 22, 19, 14, 14, 11, 20, 28],
                       'id2': [5, 7, 7, 9, 12, 9, 9, 4],
                       'id3': [11, 8, 10, 6, 6, 7, 9, 12]})


from pandas.api.types import CategoricalDtype

cat_type = CategoricalDtype(categories=old_df2.name.unique())
new_df = old_df2.groupby([old_df2.name.astype(cat_type),'id2'])['id3'].count().fillna(0)
    
# rowname to index 
new_df = new_df.reset_index()

new_df = new_df[new_df["name"].isin(["A","B"])]

print (new_df['name'])
# 0    A
# 1    A
# 2    A
# 3    A
# 4    A
# 5    B
# 6    B
# 7    B
# 8    B
# 9    B
# Name: name, dtype: category
# Categories (3, object): ['A', 'B', 'C']

new_df['name'] = new_df['name'].cat.remove_unused_categories()

print (new_df['name'])
# 0    A
# 1    A
# 2    A
# 3    A
# 4    A
# 5    B
# 6    B
# 7    B
# 8    B
# 9    B
# Name: name, dtype: category
# Categories (2, object): ['A', 'B']
  • Related