Home > Back-end >  Add column to pandas df of order per category in another column
Add column to pandas df of order per category in another column

Time:03-10

I would like to add another column to a dataframe in which there would be order (lowest to highest) of prices of different fruits per garden category. In this case, it would be:

[2,1,3, 3,2,1, 1,3,2]

Alternatively, I'd want to show e.g. the best 2 fruits (lowest price) per garden category, eg:

    garden  fruit   price
0   A   peach   4.359949
1   B   peach   0.259262
4   B   apple   4.203678
5   C   apple   3.303348
6   A   plum    2.046486
8   C   plum    2.996547

Example of such a dataframe:

import pandas as pd
import numpy as np

np.random.seed(2)

df_fruits = pd.DataFrame({
    "fruit": np.repeat(np.asarray(["peach", "apple", "plum"]),3),
    "garden": 3*["A", "B", "C"],
    "price": 10*np.random.rand(9),
    
})

df_fruits
    fruit   garden  price
0   peach   A   4.359949
1   peach   B   0.259262
2   peach   C   5.496625
3   apple   A   4.353224
4   apple   B   4.203678
5   apple   C   3.303348
6   plum    A   2.046486
7   plum    B   6.192710
8   plum    C   2.996547

What I tried is for the alternatives to do: df_fruits.sort_values(by=["fruit", "price"]).groupby('fruit').head(2) which is fine and what I want to see but I don't know how to go from this to the extra column I want to add with all the orders. The only way how I figured out how to do that is by saving the index, sorting, adding manually column of order and then sorting back. That is fine if the number of items is the same but fails if in some category some fruit was missing. How do I do it to account for that?

The solution is:

tmp = df_fruits.sort_values(by=["fruit", "price"]).reset_index()
tmp["order"] = 3*[1,2,3]
tmp.sort_values(by='index').drop(columns=['index'])

which yields the desired

fruit   garden  price   order
4   peach   A   4.359949    2
3   peach   B   0.259262    1
5   peach   C   5.496625    3
2   apple   A   4.353224    3
1   apple   B   4.203678    2
0   apple   C   3.303348    1
6   plum    A   2.046486    1
8   plum    B   6.192710    3
7   plum    C   2.996547    2

CodePudding user response:

IIUC, you can use groupby.rank.

df_fruits['order'] = df_fruits.groupby('fruit')['price'].rank()
print(df_fruits)
#    fruit garden     price  order
# 0  peach      A  4.359949    2.0
# 1  peach      B  0.259262    1.0
# 2  peach      C  5.496625    3.0
# 3  apple      A  4.353224    3.0
# 4  apple      B  4.203678    2.0
# 5  apple      C  3.303348    1.0
# 6   plum      A  2.046486    1.0
# 7   plum      B  6.192710    3.0
# 8   plum      C  2.996547    2.0
  • Related