Home > Enterprise >  Remove duplicates with the same NAME and keep the one with a higher USER RATING COUNT. (python)
Remove duplicates with the same NAME and keep the one with a higher USER RATING COUNT. (python)

Time:02-12

I have sorted a list of tuples by name, each tuple has the elements id, name, average_rating, rating_count, developers, and size.

I'm now trying to remove the duplicate name and only keep the one with the higher rating_count.

A snip of my sorted list of tuples

[(320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 16, 'Yoshikazu Kakinoki', 3940352),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352)
]

I would like an output of

[(320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656),
(299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352)
]

only keeping '5x5 Shogi (MiniShogi) K55' with the highest rating_count which is 900.

My take

    for i in range(len(list_of_games)):
        if list_of_games[i][1] == list_of_games[i 1][1] and \
                list_of_games[i][3] > list_of_games[i 1][3]:
            list_of_games.remove(list_of_games[i 1])
        elif list_of_games[i][1] == list_of_games[i 1][1] and \
                list_of_games[i][3] < list_of_games[i 1][3]:
            list_of_games.remove(list_of_games[i])

But I got an out of range error

line 222, in main
    if list_of_games[i][1] == list_of_games[i 1][1] and \
IndexError: list index out of range

May I know how do I solve this?

CodePudding user response:

You don't have to iterate over last the element. You got out of range, because by iterating over last element you are comparing it with last 1th element, which doesn't exist.

This should solve it:

for i in range(len(list_of_games)-1):
        if list_of_games[i][1] == list_of_games[i 1][1] and \
                list_of_games[i][3] > list_of_games[i 1][3]:
            list_of_games.remove(list_of_games[i 1])
        elif list_of_games[i][1] == list_of_games[i 1][1] and \
                list_of_games[i][3] < list_of_games[i 1][3]:
            list_of_games.remove(list_of_games[i])

CodePudding user response:

Another solution, using itertools.groupby.

First, sort the list of games by id (ASC), average_rating (DESC) and then use groupby() to remove the duplicates:

from itertools import groupby

out = [
    next(g)
    for _, g in groupby(
        sorted(list_of_games, key=lambda g: (g[0], -g[3])), lambda g: g[0]
    )
]
print(out)

Prints:

[
    (299139129, '5x5 Shogi (MiniShogi) K55', 4.0, 900, 'Yoshikazu Kakinoki', 3940352), 
    (320469536, '4_IN_1_ROW_Lite', 2.5, 3593, 'AOBO Co.,Ltd', 7750656)
]
  • Related