Python. Parse csv file. Append value to the previous row if the next row has the same ID-CodePudding

I have a CSV file of around 8000 rows and 2 columns:

userID     bookTitle
1123       book title 1
1123       book title 2
1123       book title 3
54         book title 2
776        book title 7
776        book title 1

I need it to be transformed to:

1123, book title 1, book title 2, book title 3
54, book title 2
776, book title 7, book title 1

meaning that each row is a user with its historical borrowings

CodePudding user response：

you could try something like this :

import csv
from collections import OrderedDict
from pprint import pprint
with open("test.csv") as csvfile:
    spamreader = csv.reader(csvfile, delimiter = ",")

    firstLine = True
    #d = OrderedDict() # if orderer important
    d = dict()
    for row in spamreader:
        if firstLine:
            firstLine = False
            continue
        d.setdefault(row[0], [])
        d[row[0]].append(row[1])

for k,v in d.items():
    print ",".join([k]   v)

Result:

54,book title 2
1123,book title 1,book title 2,book title 3
776,book title 7,book title 1

CodePudding user response：

My Python 3 version, as I wrote it before I saw you already had an answer...

from collections import defaultdict

# Use a defaultdict so we don't need to worry about
# initialising the value of each key
result = defaultdict(list)

# The reader
# Input is like:
# userID     bookTitle
# 1123       book title 1
# 1123       book title 2
# 1123       book title 3
# 54         book title 2
# 776        book title 7
# 776        book title 1 
with open("myfile.csv") as fp:
    for line in fp.readlines():
        if line.startswith("userID"):
            # Ignore the first line
            continue
        else:
            parts = line.split()
            # The key is the first value, the rest of the
            # values constitute the book title.
            result[parts[0]].append(" ".join(parts[1:]))
           
# The printer   
# Output should look like this:
# 1123, book title 1, book title 2, book title 3
# 54, book title 2
# 776, book title 7, book title 1
for key, value in result.items():
    print(', '.join([key]   value))

This is basically the same strategy as @baskettaz used, i.e. build a dictionary and then print it out in the desired format.