I have a csv file that looks something like this:
apple 12 yes
apple 15 no
apple 19 yes
and I want to use the fruit as a key and turn rest of the row into a list of lists that's a value, so it looks like:
{'apple': [[12, 'yes'],[15, 'no'],[19, 'yes']]}
A sample of my code below, turns each row into its own dictionary, when I want to combine the data.
import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
D = {row[0]:[row[1],row[2]]}
print(D)
My output looks like:
{'apple': [12,'yes']}
{'apple': [15,'no']}
{'apple': [19,'yes']}
CodePudding user response:
You can use a mix of sorting and groupby:
from itertools import groupby
from operator import itemgetter
_input = """apple 12 yes
apple 15 no
apple 19 yes
"""
entries = [l.split() for l in _input.splitlines()]
{key : [values[1:] for values in grp] for key, grp in groupby( sorted(entries, key=itemgetter(0)), key=itemgetter(0))}
Sorting is applied before groupby to have unduplicated keys, and the key of both is taking the first element of each line.
CodePudding user response:
Part of the issue you are running into is that rather than "adding" data to D[key]
via append, you are just replacing it. In the end you get only the last result per key.
You might look at collections.defaultdict(list)
as a strategy to initialize D
or use setdefault()
. In this case I'll use setdefault()
as it is straightforward, but don't discount defaultdict()
in more complicated senarios.
data = [
["apple", 12, "yes"],
["apple", 15, "no"],
["apple", 19, "yes"]
]
result = {}
for item in data:
result.setdefault(item[0], []).append(item[1:])
print(result)
This should give you:
{
'apple': [
[12, 'yes'],
[15, 'no'],
[19, 'yes']
]
}
If you were keen on looking at defaultdict()
an solution based on it might look like:
import collections
data = [
["apple", 12, "yes"],
["apple", 15, "no"],
["apple", 19, "yes"]
]
result = collections.defaultdict(list)
for item in data:
result[item[0]].append(item[1:])
print(dict(result))
CodePudding user response:
Your problem is you reset D
in every iteration. Don't do that.
Note that the output may look somewhat related to what you want, but this isn't actually the case. If you inspect the variable D
after this code is finished running, you'll see that it contains only the last value that you set it to:
{'apple': [19,'yes']}
Instead, add keys to the dictionary whenever you encounter a new fruit. The value at this key will be an empty list. Then append the data you want to this empty list.
import csv
fp = open('fruits.csv', 'r')
reader = csv.reader(fp)
next(reader,None)
D = {}
for row in reader:
if row[0] not in D: # if the key doesn't already exist in D, add an empty list
D[row[0]] = []
D[row[0]].append([row[1:]]) # append the rest of this row to the list in the dictionary
print(D) # print the dictionary AFTER you finish creating it
Alternatively, define D
as a collections.defaultdict(list)
and you can skip the entire if
block
Note that in a single dictionary, one key can only have one value. There can not be multiple values assigned to the same key. In this case, each fruit name (key) has a single list value assigned to it. This list contains more lists inside it, but that is immaterial to the dictionary.