Home > Back-end >  Reorder level in a multiindex from a pandas pivot_table?
Reorder level in a multiindex from a pandas pivot_table?

Time:04-15

This question was hard to word, sorry for the bad title. I have a multiindex dataframe created from a pivot_table that I have transposed, the indexes are now the columns. I already know how to reorder the outer index by just doing:

df[['Sunday', 'Monday', 'Tuesday', ...]] 

As you can see the outer level is days of the week and the inner level is what I need reordered. I need to have the inner index in the order of 'pre' 'day' 'twi'. Not all days have all inner level values, eg. Sunday has 'pre' and 'twi' but no 'day'. Here is the creation of the pivot_table:

quantile_df = pd.DataFrame(quantile_list, columns=['error', 'weekday', 'sort', 'scaler'])
quantile_df = quantile_df.pivot_table(values='scaler', index='error', columns=['weekday', 'sort'])

last_week = sort_df[sort_df['volume'] > 5000].loc['2022-04-03':'2022-04-09',:].pivot_table(columns=['weekday', 'sort'])

Let me know if more context is needed,

enter image description here

CodePudding user response:

You can use CategoricalDtype. And it's best if you change the column type before the pivot:

# Some sample data
import string
error = list(string.ascii_uppercase[:10])
weekday = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]
sort_ = ["pre", "day", "twi"]

idx = pd.MultiIndex.from_product([error, weekday, sort_], names=["error", "weekday", "sort"])
df = pd.DataFrame({"value": np.random.randint(1, 100, len(idx))}, index=idx).reset_index()

# The code
# You can pick whatever day as first day of the week
weekdayDtype = pd.CategoricalDtype(["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"], ordered=True)
sortDtype = pd.CategoricalDtype(["pre", "day", "twi"], ordered=True)

df["weekday"] = df["weekday"].astype(weekdayDtype)
df["sort"] = df["sort"].astype(sortDtype)
df.pivot(index="error", columns=["weekday", "sort"])

CodePudding user response:

EDIT: my answer would sort days of week alphabetically, which is probably not what you're looking for. I think Code Different's answer is way better! The only issue with using pivot tables on CategoricalDtype is that the resulting pivot table will have show ALL categories, not just the ones in your dataframe, which can be an problem for large data sets. (for instance it will pivot all 7 days of the week even if your dataframe only has values for Saturday and Sunday)

I've had the exact same issue in the past and ended up adding in a helper level into the MulitiIndex. It looks a little messy but it will certainly order the columns correctly

first, create a dictionary so that you can map the desired order

sort_order = {'pre':0,'day':1,'twi':2}

second, create a new MultiIndex using the from_arrays method.

we pass in a list of 3 lists. the first and last lists are unchanged from your original pivot table columns. However, the second list is where we use list comprehension to map the new sort order.

new_idx = pd.MultiIndex.from_arrays(
                arrays=
                    [
                        [x[0] for x in last_week.columns], 
                        [sort_order[x[1]] for x in last_week.columns],
                        [x[1] for x in last_week.columns],
                    ],
                    names= ['weekday','sort_helper','sort']
                )

we replace the current index with the new one, then use sort_index on axis 1.

last_week.columns = new_idx
last_week = last_week.sort_index(axis=1)

if desired, we can drop out the helper level:

last_week.columns = last_week.columns.droplevel(1)

hope this helps :)

  • Related