I've created a small dataset comparing coffee drink prices per cup size.
When I pivot my dataset the output automatically reorders the index (the 'Size' column) alphabetically.
Is there a way to assign the different sizes a numerical level (e.g. small = 0, medium = 1, large = 2) and reorder the rows this way instead?
I'm know this can be done in R using the forcats library (using fct_relevel for example), but I'm not aware of how to do this in python. I would prefer to keep the solution to using numpy and pandas.
data = {'Item': np.repeat(['Latte', 'Americano', 'Cappuccino'], 3),
'Size': ['Small', 'Medium', 'Large']*3,
'Price': [2.25, 2.60, 2.85, 1.95, 2.25, 2.45, 2.65, 2.95, 3.25]
}
df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')
df
# Price
# Item Americano Cappuccino Latte
# Size
# Large 2.45 3.25 2.85
# Medium 2.25 2.95 2.60
# Small 1.95 2.65 2.25
CodePudding user response:
You can use a Categorical
type with ordered=True
:
df.index = pd.Categorical(df.index,
categories=['Small', 'Medium', 'Large'],
ordered=True)
df = df.sort_index()
output:
Price
Item Americano Cappuccino Latte
Small 1.95 2.65 2.25
Medium 2.25 2.95 2.60
Large 2.45 3.25 2.85
You can access the codes with:
>>> df.index.codes
array([0, 1, 2], dtype=int8)
If this was a Series:
>>> series.cat.codes
CodePudding user response:
One option is to create the categorical, before pivoting; for this case I am using encode_categorical
from pyjanitor
, primarily for convenience :
# pip install pyjanitor
import pandas as pd
import janitor
(df
.encode_categorical(Size = (None, 'appearance'))
.pivot_table(index='Size', columns='Item')
)
Price
Item Americano Cappuccino Latte
Size
Small 1.95 2.65 2.25
Medium 2.25 2.95 2.60
Large 2.45 3.25 2.85
This way, you do not have to bother about sorting, since pivoting implicitly does that. You can skip the pyjanitor and just stick to Pandas only:
(df
.astype({'Size': pd.CategoricalDtype(categories = ['Small', 'Medium', 'Large'],
ordered = True)})
.pivot_table(index='Size', columns='Item')
)
Price
Item Americano Cappuccino Latte
Size
Small 1.95 2.65 2.25
Medium 2.25 2.95 2.60
Large 2.45 3.25 2.85
CodePudding user response:
1st WAY:
pivot_table
function sorts rows based on index.
Thus, It is better to use lambda function when applying index in pivot_table function.
This way, you don't need any further sorting steps(more time consuming) or any third-party library.
df = pd.pivot_table(df, index = (lambda row: 0 if df.loc[row,'Size']=="Small" else 1 if df.loc[row,'Size']=="Medium" else 2),
columns = 'Item')
Price
Item Americano Cappuccino Latte
0 1.95 2.65 2.25
1 2.25 2.95 2.60
2 2.45 3.25 2.85
2nd WAY:
You can also use your own code and then rename and sort the newly-created table:
df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')
# rename:
df = df.rename(index= lambda x: 0 if x=="Small" else 1 if x=="Medium" else 2)
#sort:
df = df.sort_index(ascending = True)
Price
Item Americano Cappuccino Latte
0 1.95 2.65 2.25
1 2.25 2.95 2.60
2 2.45 3.25 2.85