I have a CSV file of around 8000 rows and 2 columns:
userID bookTitle
1123 book title 1
1123 book title 2
1123 book title 3
54 book title 2
776 book title 7
776 book title 1
I need it to be transformed to:
1123, book title 1, book title 2, book title 3
54, book title 2
776, book title 7, book title 1
meaning that each row is a user with its historical borrowings
CodePudding user response:
you could try something like this :
import csv
from collections import OrderedDict
from pprint import pprint
with open("test.csv") as csvfile:
spamreader = csv.reader(csvfile, delimiter = ",")
firstLine = True
#d = OrderedDict() # if orderer important
d = dict()
for row in spamreader:
if firstLine:
firstLine = False
continue
d.setdefault(row[0], [])
d[row[0]].append(row[1])
for k,v in d.items():
print ",".join([k] v)
Result:
54,book title 2
1123,book title 1,book title 2,book title 3
776,book title 7,book title 1
CodePudding user response:
My Python 3 version, as I wrote it before I saw you already had an answer...
from collections import defaultdict
# Use a defaultdict so we don't need to worry about
# initialising the value of each key
result = defaultdict(list)
# The reader
# Input is like:
# userID bookTitle
# 1123 book title 1
# 1123 book title 2
# 1123 book title 3
# 54 book title 2
# 776 book title 7
# 776 book title 1
with open("myfile.csv") as fp:
for line in fp.readlines():
if line.startswith("userID"):
# Ignore the first line
continue
else:
parts = line.split()
# The key is the first value, the rest of the
# values constitute the book title.
result[parts[0]].append(" ".join(parts[1:]))
# The printer
# Output should look like this:
# 1123, book title 1, book title 2, book title 3
# 54, book title 2
# 776, book title 7, book title 1
for key, value in result.items():
print(', '.join([key] value))
This is basically the same strategy as @baskettaz used, i.e. build a dictionary and then print it out in the desired format.