More optimized and scalable way to write code-CodePudding

I have following data frame as input:

FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,1,No
Amy,R6,7,Japanese,0,No
Ben,R5,2,Japanese,1,No
Ben,R2,3,French,1,No
Cathy,R5,1,Japanese,1,No
Cathy,R8,2,Mexican,1,No
Amy,R1,2,French,2,Yes
Amy,R4,3,French,2,Yes
Amy,R3,4,French,1,Yes
Amy,R10,5,Mexican,2,Yes
Ben,R7,1,Japanese,1,Yes
Ben,R4,5,French,2,Yes
Ben,R10,6,Mexican,2,Yes
Ben,R1,9,French,2,Yes
Cathy,R1,5,French,2,Yes
Cathy,R3,6,French,1,Yes
Cathy,R10,7,Mexican,2,Yes
Cathy,R7,9,Japanese,1,Yes

I wanted to recommend the top 4 restaurants to each friend according to their InterestRank (1 being most interested, 10 being least interested) as well as two conditions that are: no more than 2 restaurants with the same cuisine type will be recommended to each of them and recommend restaurants with least cost first (i.e. 0 then 1 and then 2). I came up with the following code using Pandas to achieve this:

result = (
    df
    # Sort `(friend, cuisine)` group by interest rank and take the top 2
    .sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
    .groupby(['FriendName', 'Cuisine'])
    .head(2)
    # Sort `(friend, cost)` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'Cost', 'InterestRank'], ascending=True)
    .groupby(['FriendName','Cost'])
    .head(4)
    # Sort `friend` group by interest rank and take the top 4
    .sort_values(by=['FriendName', 'InterestRank'], ascending=True)
    .groupby(['FriendName'])
    .head(4)
    # Reset index, which was just "scrambled" from the sorting and slicing
    .reset_index(drop=True)
)

The resulting data frame looks like the following:

FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,1,No
Amy,R1,2,French,2,Yes
Amy,R10,5,Mexican,2,Yes
Amy,R6,7,Japanese,0,No
Ben,R7,1,Japanese,1,Yes
Ben,R5,2,Japanese,1,No
Ben,R2,3,French,1,No
Ben,R4,5,French,2,Yes
Cathy,R5,1,Japanese,1,No
Cathy,R8,2,Mexican,1,No
Cathy,R1,5,French,2,Yes
Cathy,R3,6,French,1,Yes

I want the final output in the following format:

FriendName,Restaurant,RecommendationRank
Amy,R2,1
Amy,R1,2
Amy,R10,3
Amy,R6,4
Ben,R7,1
Ben,R5,2
Ben,R2,3
Ben,R4,4
Cathy,R5,1
Cathy,R8,2
Cathy,R1,3
Cathy,R3,4

How can I achieve the above output? Also, is there any more scalable way to write above code to achieve the result?

CodePudding user response：

No need to groupby Cost since it's a ranking factor:

(df.sort_values(['InterestRank', 'Cost'])
   .groupby(['FriendName', 'Cuisine'])
   .head(2)
   .groupby('FriendName')
   .head(4)
   .sort_values('FriendName')[
      ['FriendName', 'Restaurant']
   ].assign(
       RecommendationRank=lambda x: x.groupby('FriendName').cumcount()   1
   )
)

   FriendName Restaurant  RecommendationRank
0         Amy         R2                   1
6         Amy         R1                   2
9         Amy        R10                   3
1         Amy         R6                   4
10        Ben         R7                   1
2         Ben         R5                   2
3         Ben         R2                   3
11        Ben         R4                   4
4       Cathy         R5                   1
5       Cathy         R8                   2
14      Cathy         R1                   3
15      Cathy         R3                   4