Home > OS >  Python - 2D list - find duplicates in one column and sum values in another column
Python - 2D list - find duplicates in one column and sum values in another column

Time:11-26

I have a 2D list that contains soccer player names, the number of times they scored a goal, and the number of times they attempted a shot on goal, respectively.

player_stats = [['Adam', 5, 10], ['Kyle', 12, 18], ['Jo', 20, 35], ['Adam', 15, 20], ['Charlie', 31, 58], ['Jo', 6, 14], ['Adam', 10, 15]]

From this list, I'm trying to return another list that shows only one instance of each player with their respective total goals and total attempts on goal, like so:

player_stats_totals = [['Adam', 30, 45], ['Kyle', 12, 18], ['Jo', 26, 49], ['Charlie', 31, 58]]

After searching on Stack Overflow I was able to learn (from this thread) how to return the indexes of the duplicate players

x = [player_stats[i][0] for i in range (len(player_stats))]

for i in range (len(x)):
    if (x[i] in x[:i]) or (x[i] in x[i 1:]): print (x[i], i)

but got stuck on how to proceed thereafter and if indeed this method is strictly relevant for what I need(?)

What's the most efficient way to return the desired list of totals?

CodePudding user response:

What you want to do is use a dictionary where the key is the player name and the value is a list containing [goals, shots]. Constructing it would look like this:

all_games_stats = {}
for stat in player_stats:
    player, goals, shots = stat
    if player not in all_games_stats:
        all_games_stats[player] = [goals, shots]
    else:
        stat_list = all_games_stats[player]
        stat_list[0]  = goals
        stat_list[1]  = shots

Then, if you want to represent the players and their stats as a list, you would do: list(all_games_stats.items())

CodePudding user response:

You can convert the list to a dictionary. (It can always be changed back once done) This works:

player_stats = [['Adam', 5, 10], ['Kyle', 12, 18], ['Jo', 
20, 35], ['Adam', 15, 20], ['Charlie', 31, 58], ['Jo', 6, 
14], ['Adam', 10, 15]]

new_stats = {}


for item in player_stats:
    if not item[0] in new_stats:
        new_stats[item[0]] = [item[1],item[2]]
    else:
        new_stats[item[0]][0]  = item[1]
        new_stats[item[0]][1]  = item[2]

print(new_stats)

CodePudding user response:

Use a dictionary to accumulate the values for a given player:

player_stats = [['Adam', 5, 10], ['Kyle', 12, 18], ['Jo', 20, 35], ['Adam', 15, 20], ['Charlie', 31, 58], ['Jo', 6, 14], ['Adam', 10, 15]]

lookup = {}
for player, first, second in player_stats:
    
    # if the player has not been seen add a new list with 0, 0 
    if player not in lookup:
        lookup[player] = [0, 0]
    
    # get the accumulated total so far 
    first_total, second_total = lookup[player]
    
    # add the current values to the accumulated total, and update the values 
    lookup[player] = [first_total   first, second_total   second]

# create the output in the expected format
res = [[player, first, second] for player, (first, second) in lookup.items()]
print(res)

Output

[['Adam', 30, 45], ['Kyle', 12, 18], ['Jo', 26, 49], ['Charlie', 31, 58]]

A more advanced, and pythonic, version is to use a collections.defaultdict:

from collections import defaultdict

player_stats = [['Adam', 5, 10], ['Kyle', 12, 18], ['Jo', 20, 35],
                ['Adam', 15, 20], ['Charlie', 31, 58], ['Jo', 6, 14], ['Adam', 10, 15]]

lookup = defaultdict(lambda: [0, 0])
for player, first, second in player_stats:
    # get the accumulated total so far
    first_total, second_total = lookup[player]

    # add the current values to the accumulated total, and update the values
    lookup[player] = [first_total   first, second_total   second]

# create the output in the expected format
res = [[player, first, second] for player, (first, second) in lookup.items()]

print(res)

This approach has the advantage of skipping the initialisation. Both has approaches are O(n).

Notes

The expression:

res = [[player, first, second] for player, (first, second) in lookup.items()]

is a list comprehension, equivalent to the following for loop:

res = []
for player, (first, second) in lookup.items():
    res.append([player, first, second])

Additionally, read this for understanding unpacking.

CodePudding user response:

I might as well submit something, too. Here's yet another method with some list comprehension worked in:

# Unique values to new dictionary with goal and shots on goal default entries 
agg_stats = dict.fromkeys(set([p[0] for p in player_stats]), [0, 0])

# Iterate over the player stats list
for player in player_stats:
    # Set entry to sum of current and next stats values for the corresponding player.
    agg_stats[player[0]] = [sum([agg_stats.get(player[0])[i], stat]) for i, stat in enumerate(player[1:])]
  • Related