How to reorder rows in pandas dataframe by factor level in python?-CodePudding

I've created a small dataset comparing coffee drink prices per cup size.

When I pivot my dataset the output automatically reorders the index (the 'Size' column) alphabetically.

Is there a way to assign the different sizes a numerical level (e.g. small = 0, medium = 1, large = 2) and reorder the rows this way instead?

I'm know this can be done in R using the forcats library (using fct_relevel for example), but I'm not aware of how to do this in python. I would prefer to keep the solution to using numpy and pandas.

data = {'Item': np.repeat(['Latte', 'Americano', 'Cappuccino'], 3),
        'Size': ['Small', 'Medium', 'Large']*3,
        'Price': [2.25, 2.60, 2.85, 1.95, 2.25, 2.45, 2.65, 2.95, 3.25]
       }

df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')
df

#         Price
# Item    Americano Cappuccino  Latte
#   Size            
#  Large       2.45       3.25   2.85
# Medium       2.25       2.95   2.60
#  Small       1.95       2.65   2.25

CodePudding user response：

You can use a Categorical type with ordered=True:

df.index = pd.Categorical(df.index,
                          categories=['Small', 'Medium', 'Large'],
                          ordered=True)
df = df.sort_index()

output:

           Price                 
Item   Americano Cappuccino Latte
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

You can access the codes with:

>>> df.index.codes
array([0, 1, 2], dtype=int8)

If this was a Series:

>>> series.cat.codes

CodePudding user response：

One option is to create the categorical, before pivoting; for this case I am using encode_categorical from pyjanitor, primarily for convenience :

# pip install pyjanitor
import pandas as pd
import janitor
(df
 .encode_categorical(Size = (None, 'appearance'))
 .pivot_table(index='Size', columns='Item')
)

           Price                 
Item   Americano Cappuccino Latte
Size                             
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

This way, you do not have to bother about sorting, since pivoting implicitly does that. You can skip the pyjanitor and just stick to Pandas only:

(df
 .astype({'Size': pd.CategoricalDtype(categories = ['Small', 'Medium', 'Large'], 
                                      ordered = True)})
 .pivot_table(index='Size', columns='Item')
)

           Price                 
Item   Americano Cappuccino Latte
Size                             
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

CodePudding user response：

1st WAY:

pivot_table function sorts rows based on index. Thus, It is better to use lambda function when applying index in pivot_table function. This way, you don't need any further sorting steps(more time consuming) or any third-party library.

df = pd.pivot_table(df, index = (lambda row: 0 if df.loc[row,'Size']=="Small" else 1 if df.loc[row,'Size']=="Medium" else 2), 
                    columns = 'Item')

         Price                 
Item Americano Cappuccino Latte
0         1.95       2.65  2.25
1         2.25       2.95  2.60
2         2.45       3.25  2.85

2nd WAY:

You can also use your own code and then rename and sort the newly-created table:

df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')

# rename:
df = df.rename(index= lambda x: 0 if x=="Small" else 1 if x=="Medium" else 2)

#sort:
df = df.sort_index(ascending = True)


         Price                 
Item Americano Cappuccino Latte
0         1.95       2.65  2.25
1         2.25       2.95  2.60
2         2.45       3.25  2.85