Home > Net >  Python. Parse csv file. Append value to the previous row if the next row has the same ID
Python. Parse csv file. Append value to the previous row if the next row has the same ID

Time:03-08

I have a CSV file of around 8000 rows and 2 columns:

userID     bookTitle
1123       book title 1
1123       book title 2
1123       book title 3
54         book title 2
776        book title 7
776        book title 1 

I need it to be transformed to:

1123, book title 1, book title 2, book title 3
54, book title 2
776, book title 7, book title 1

meaning that each row is a user with its historical borrowings

CodePudding user response:

you could try something like this :

import csv
from collections import OrderedDict
from pprint import pprint
with open("test.csv") as csvfile:
    spamreader = csv.reader(csvfile, delimiter = ",")

    firstLine = True
    #d = OrderedDict() # if orderer important
    d = dict()
    for row in spamreader:
        if firstLine:
            firstLine = False
            continue
        d.setdefault(row[0], [])
        d[row[0]].append(row[1])

for k,v in d.items():
    print ",".join([k]   v)

Result:

54,book title 2
1123,book title 1,book title 2,book title 3
776,book title 7,book title 1

CodePudding user response:

My Python 3 version, as I wrote it before I saw you already had an answer...

from collections import defaultdict

# Use a defaultdict so we don't need to worry about
# initialising the value of each key
result = defaultdict(list)

# The reader
# Input is like:
# userID     bookTitle
# 1123       book title 1
# 1123       book title 2
# 1123       book title 3
# 54         book title 2
# 776        book title 7
# 776        book title 1 
with open("myfile.csv") as fp:
    for line in fp.readlines():
        if line.startswith("userID"):
            # Ignore the first line
            continue
        else:
            parts = line.split()
            # The key is the first value, the rest of the
            # values constitute the book title.
            result[parts[0]].append(" ".join(parts[1:]))
           
# The printer   
# Output should look like this:
# 1123, book title 1, book title 2, book title 3
# 54, book title 2
# 776, book title 7, book title 1
for key, value in result.items():
    print(', '.join([key]   value))

This is basically the same strategy as @baskettaz used, i.e. build a dictionary and then print it out in the desired format.

  • Related