I have sorted a list of tuples by name, each tuple has the elements id, name, average_rating, rating_count, developers, and size.
I'm now trying to remove the duplicate name and only keep the one with the higher rating_count.
A snip of my sorted list of tuples
[(320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 16, 'Yoshikazu Kakinoki', 3940352),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352)
]
I would like an output of
[(320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352)
]
only keeping '5x5 Shogi (MiniShogi) K55' with the highest rating_count which is 900.
My take
for i in range(len(list_of_games)):
if list_of_games[i][1] == list_of_games[i 1][1] and \
list_of_games[i][3] > list_of_games[i 1][3]:
list_of_games.remove(list_of_games[i 1])
elif list_of_games[i][1] == list_of_games[i 1][1] and \
list_of_games[i][3] < list_of_games[i 1][3]:
list_of_games.remove(list_of_games[i])
But I got an out of range error
line 222, in main
if list_of_games[i][1] == list_of_games[i 1][1] and \
IndexError: list index out of range
May I know how do I solve this?
CodePudding user response:
You don't have to iterate over last the element. You got out of range, because by iterating over last element you are comparing it with last 1th element, which doesn't exist.
This should solve it:
for i in range(len(list_of_games)-1):
if list_of_games[i][1] == list_of_games[i 1][1] and \
list_of_games[i][3] > list_of_games[i 1][3]:
list_of_games.remove(list_of_games[i 1])
elif list_of_games[i][1] == list_of_games[i 1][1] and \
list_of_games[i][3] < list_of_games[i 1][3]:
list_of_games.remove(list_of_games[i])
CodePudding user response:
Another solution, using itertools.groupby
.
First, sort the list of games by id
(ASC), average_rating
(DESC) and then use groupby()
to remove the duplicates:
from itertools import groupby
out = [
next(g)
for _, g in groupby(
sorted(list_of_games, key=lambda g: (g[0], -g[3])), lambda g: g[0]
)
]
print(out)
Prints:
[
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352),
(320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656)
]