I'd like some help to count frequency of a key and also list some unique data associated with that key.
Imagine input file csv like this:
key1, owner1, owner2
key2, ownerA, ownerB
key2, ownerB, ownerB
key3, ownerJ, ownerK
key3, ownerJ, ownerK
key3, ownerL, ownerM
I'd like the output csv to be:
key | Freq | List of owners with duplicates removed
key3, 3, ownerJ, ownerK, ownerL, ownerM
key2, 2, ownerA, ownerB
key1, 1, owner1, owner2
I've written code to accomplish the frequency count. But I don't know how to create the list of unique owners? Here is my code so far in python:
import csv
import collections
multiOwner = collections.Counter()
with open('input.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
multiOwner[row[0]] = 1
print ("\n".join(str(element_and_count)
for element_and_count in multiOwner.most_common()))
How can I build the list of owners and keep it associated with the right key?
CodePudding user response:
Use nested dictionaries, and a set
for the owners to remove duplicates. You can use defaultdict()
to initialize the data for each key.
import csv
import collections
multiOwner = collections.defaultdict(lambda: {'freq': 0, 'owners': set()}
with open('input.csv', newline="") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for key, owner1, owner2, *_ in csv_reader:
multiOwner[key]['freq'] = 1
multiOwner[key]['owners'].add(owner1)
multiOwner[key]['owners'].add(owner2)