Home > Blockchain >  How would I merge identical dictionary keys into one?
How would I merge identical dictionary keys into one?

Time:04-04

I have a csv file that looks something like this:

apple   12   yes
apple   15   no
apple   19   yes

and I want to use the fruit as a key and turn rest of the row into a list of lists that's a value, so it looks like:

{'apple': [[12, 'yes'],[15, 'no'],[19, 'yes']]}

A sample of my code below, turns each row into its own dictionary, when I want to combine the data.

import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
    D = {row[0]:[row[1],row[2]]}
    print(D)

My output looks like:

{'apple': [12,'yes']}
{'apple': [15,'no']}
{'apple': [19,'yes']}

CodePudding user response:

You can use a mix of sorting and groupby:

from itertools import groupby
from operator import itemgetter

_input = """apple   12   yes
apple   15   no
apple   19   yes
"""
entries = [l.split() for l in _input.splitlines()]
{key : [values[1:] for values in grp] for key, grp in groupby( sorted(entries, key=itemgetter(0)), key=itemgetter(0))}

Sorting is applied before groupby to have unduplicated keys, and the key of both is taking the first element of each line.

CodePudding user response:

Part of the issue you are running into is that rather than "adding" data to D[key] via append, you are just replacing it. In the end you get only the last result per key.

You might look at collections.defaultdict(list) as a strategy to initialize D or use setdefault(). In this case I'll use setdefault() as it is straightforward, but don't discount defaultdict() in more complicated senarios.

data = [
    ["apple", 12, "yes"],
    ["apple", 15, "no"],
    ["apple", 19, "yes"]
]

result = {}
for item in data:
    result.setdefault(item[0], []).append(item[1:])
print(result)

This should give you:

{
    'apple': [
        [12, 'yes'],
        [15, 'no'],
        [19, 'yes']
    ]
}

If you were keen on looking at defaultdict() an solution based on it might look like:

import collections

data = [
    ["apple", 12, "yes"],
    ["apple", 15, "no"],
    ["apple", 19, "yes"]
]

result = collections.defaultdict(list)
for item in data:
    result[item[0]].append(item[1:])
print(dict(result))

CodePudding user response:

Your problem is you reset D in every iteration. Don't do that.

Note that the output may look somewhat related to what you want, but this isn't actually the case. If you inspect the variable D after this code is finished running, you'll see that it contains only the last value that you set it to:

{'apple': [19,'yes']}

Instead, add keys to the dictionary whenever you encounter a new fruit. The value at this key will be an empty list. Then append the data you want to this empty list.

import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
    if row[0] not in D: # if the key doesn't already exist in D, add an empty list
        D[row[0]] = []
    D[row[0]].append([row[1:]]) # append the rest of this row to the list in the dictionary

print(D) # print the dictionary AFTER you finish creating it

Alternatively, define D as a collections.defaultdict(list) and you can skip the entire if block

Note that in a single dictionary, one key can only have one value. There can not be multiple values assigned to the same key. In this case, each fruit name (key) has a single list value assigned to it. This list contains more lists inside it, but that is immaterial to the dictionary.

  • Related