How would I merge identical dictionary keys into one?-CodePudding

I have a csv file that looks something like this:

apple   12   yes
apple   15   no
apple   19   yes

and I want to use the fruit as a key and turn rest of the row into a list of lists that's a value, so it looks like:

{'apple': [[12, 'yes'],[15, 'no'],[19, 'yes']]}

A sample of my code below, turns each row into its own dictionary, when I want to combine the data.

import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
    D = {row[0]:[row[1],row[2]]}
    print(D)

My output looks like:

{'apple': [12,'yes']}
{'apple': [15,'no']}
{'apple': [19,'yes']}

CodePudding user response：

You can use a mix of sorting and groupby:

from itertools import groupby
from operator import itemgetter

_input = """apple   12   yes
apple   15   no
apple   19   yes
"""
entries = [l.split() for l in _input.splitlines()]
{key : [values[1:] for values in grp] for key, grp in groupby( sorted(entries, key=itemgetter(0)), key=itemgetter(0))}

Sorting is applied before groupby to have unduplicated keys, and the key of both is taking the first element of each line.

CodePudding user response：

Part of the issue you are running into is that rather than "adding" data to D[key] via append, you are just replacing it. In the end you get only the last result per key.

You might look at collections.defaultdict(list) as a strategy to initialize D or use setdefault(). In this case I'll use setdefault() as it is straightforward, but don't discount defaultdict() in more complicated senarios.

data = [
    ["apple", 12, "yes"],
    ["apple", 15, "no"],
    ["apple", 19, "yes"]
]

result = {}
for item in data:
    result.setdefault(item[0], []).append(item[1:])
print(result)

This should give you:

{
    'apple': [
        [12, 'yes'],
        [15, 'no'],
        [19, 'yes']
    ]
}

If you were keen on looking at defaultdict() an solution based on it might look like:

import collections

data = [
    ["apple", 12, "yes"],
    ["apple", 15, "no"],
    ["apple", 19, "yes"]
]

result = collections.defaultdict(list)
for item in data:
    result[item[0]].append(item[1:])
print(dict(result))

CodePudding user response：

Your problem is you reset D in every iteration. Don't do that.

Note that the output may look somewhat related to what you want, but this isn't actually the case. If you inspect the variable D after this code is finished running, you'll see that it contains only the last value that you set it to:

{'apple': [19,'yes']}

Instead, add keys to the dictionary whenever you encounter a new fruit. The value at this key will be an empty list. Then append the data you want to this empty list.

import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
    if row[0] not in D: # if the key doesn't already exist in D, add an empty list
        D[row[0]] = []
    D[row[0]].append([row[1:]]) # append the rest of this row to the list in the dictionary

print(D) # print the dictionary AFTER you finish creating it

Alternatively, define D as a collections.defaultdict(list) and you can skip the entire if block

Note that in a single dictionary, one key can only have one value. There can not be multiple values assigned to the same key. In this case, each fruit name (key) has a single list value assigned to it. This list contains more lists inside it, but that is immaterial to the dictionary.