I have following data frame as input:
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,1,No
Amy,R6,7,Japanese,0,No
Ben,R5,2,Japanese,1,No
Ben,R2,3,French,1,No
Cathy,R5,1,Japanese,1,No
Cathy,R8,2,Mexican,1,No
Amy,R1,2,French,2,Yes
Amy,R4,3,French,2,Yes
Amy,R3,4,French,1,Yes
Amy,R10,5,Mexican,2,Yes
Ben,R7,1,Japanese,1,Yes
Ben,R4,5,French,2,Yes
Ben,R10,6,Mexican,2,Yes
Ben,R1,9,French,2,Yes
Cathy,R1,5,French,2,Yes
Cathy,R3,6,French,1,Yes
Cathy,R10,7,Mexican,2,Yes
Cathy,R7,9,Japanese,1,Yes
I wanted to recommend the top 4 restaurants to each friend according to their InterestRank (1 being most interested, 10 being least interested) as well as two conditions that are: no more than 2 restaurants with the same cuisine type will be recommended to each of them and recommend restaurants with least cost first (i.e. 0 then 1 and then 2). I came up with the following code using Pandas to achieve this:
result = (
df
# Sort `(friend, cuisine)` group by interest rank and take the top 2
.sort_values(by=['FriendName', 'Cuisine', 'InterestRank'], ascending=True)
.groupby(['FriendName', 'Cuisine'])
.head(2)
# Sort `(friend, cost)` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'Cost', 'InterestRank'], ascending=True)
.groupby(['FriendName','Cost'])
.head(4)
# Sort `friend` group by interest rank and take the top 4
.sort_values(by=['FriendName', 'InterestRank'], ascending=True)
.groupby(['FriendName'])
.head(4)
# Reset index, which was just "scrambled" from the sorting and slicing
.reset_index(drop=True)
)
The resulting data frame looks like the following:
FriendName,Restaurant,InterestRank,Cuisine,Cost,Alcohol
Amy,R2,1,French,1,No
Amy,R1,2,French,2,Yes
Amy,R10,5,Mexican,2,Yes
Amy,R6,7,Japanese,0,No
Ben,R7,1,Japanese,1,Yes
Ben,R5,2,Japanese,1,No
Ben,R2,3,French,1,No
Ben,R4,5,French,2,Yes
Cathy,R5,1,Japanese,1,No
Cathy,R8,2,Mexican,1,No
Cathy,R1,5,French,2,Yes
Cathy,R3,6,French,1,Yes
I want the final output in the following format:
FriendName,Restaurant,RecommendationRank
Amy,R2,1
Amy,R1,2
Amy,R10,3
Amy,R6,4
Ben,R7,1
Ben,R5,2
Ben,R2,3
Ben,R4,4
Cathy,R5,1
Cathy,R8,2
Cathy,R1,3
Cathy,R3,4
How can I achieve the above output? Also, is there any more scalable way to write above code to achieve the result?
CodePudding user response:
No need to groupby Cost
since it's a ranking factor:
(df.sort_values(['InterestRank', 'Cost'])
.groupby(['FriendName', 'Cuisine'])
.head(2)
.groupby('FriendName')
.head(4)
.sort_values('FriendName')[
['FriendName', 'Restaurant']
].assign(
RecommendationRank=lambda x: x.groupby('FriendName').cumcount() 1
)
)
FriendName Restaurant RecommendationRank
0 Amy R2 1
6 Amy R1 2
9 Amy R10 3
1 Amy R6 4
10 Ben R7 1
2 Ben R5 2
3 Ben R2 3
11 Ben R4 4
4 Cathy R5 1
5 Cathy R8 2
14 Cathy R1 3
15 Cathy R3 4