Home > front end >  Replace newlines in the middle of .csv columns
Replace newlines in the middle of .csv columns

Time:06-08

I have a CSV file which can look like this:

Col1,"Col2","Col31
Col32
Col33"

That means if I use this code:

with open(inpPath, "r ") as csvFile:
    wb_obj = list(csv.reader(csvFile, delimiter=','))
    for row in wb_obj:
        print(row)

The output looks like this:

[Col1,"Col2","Col31\nCol32\nCol33"]

So I am trying to replace the \n characters with spaces so the CSV file would be rewritten like this: Col1,"Col2","Col31 Col32 Col33" I have written this short function but it results in Error: Cannot open mapping csv, Exception thrown I/O operation on closed file.

def processCSV(fileName):
    with open(fileName, "rU") as csvFile:
        filtered = (line.replace('\n', ' ') for line in csvFile)
        wb_obj = csv.reader(filtered, delimiter=",")
    return wb_obj

How could I fix that? Thank you very much for any help

CodePudding user response:

Your processCSV function returns an iterable based on the file object csvFile, and yet when the iterable gets consumed by the caller of the processCSV function, the file object csvFile is already closed by the context manager for being outside the with statement, hence the said error. The common pattern for such a function is to make it accept a file object as a parameter, and let the caller open the file instead to pass the file object to the function so that the file can remain open for the caller.

You also should not replace all newlines with spaces to begin with, since you really only want to replace newlines if they are within double quotes, which would be parsed by csv.reader as part of a column value rather than row separators. Instead, you should let csv.reader do its parsing first, and then replace newline characters with spaces in all column values:

def processCSV(file):
    return ([col.replace('\n', ' ') for col in row] for row in csv.reader(file))

# the caller
with open(filename) as file:
    for row in processCSV(file):
        print(*row, sep=',')

CodePudding user response:

This comes down to using a generator expression inside of the file context. See a longer explanation here: https://stackoverflow.com/a/39656712/15981783.

You can change the generator expression to a list comprehension and it will work:

import csv

def processCSV(fileName):
    with open(fileName, "r") as csvFile:
        # list comprehension used here instead
        filtered = [line.replace('\n', ' ') for line in csvFile]
        wb_obj = csv.reader(filtered, delimiter=",")
    return wb_obj

print(list(processCSV("tmp.csv")))

Returns:

[['Col1', 'Col2', 'Col31 Col32 Col33']]
  • Related