Home > Software design >  Inconsistent quotes on .csv file
Inconsistent quotes on .csv file

Time:12-08

I have a comma delimited file which also contains commas in the actual field values, something like this:

foo,bar,"foo, bar"

This file is very large so I am wondering if there is a way in python to either put double quotes around ever field:

eg: "foo","bar","foo, bar"

or just change the delimeter overall?

eg: foo|bar|foo, bar

End goal: The goal is to ultimately load this file into sql server. Given the size of the file bulk insert is only feasible approach for loading but I cannot specify a text qualifier/field quote due to the version of ssms I have.

This leads me to believe the only remaining approach is to do some preprocessing on the source file.

CodePudding user response:

Changing the delimiter just requires parsing and re-encoding the data.

with open("data.csv") as input, open("new_data.csv", "w") as output:
    r = csv.reader(input, delimiter=",", quotechar='"')
    w = csv.writer(output, delimiter="|")
    w.writerows(r)

Given that your input file is a fairly standard version of CSV, you don't even need to specify the delimiter and quote arguments to reader; the defaults will suffice.

r = csv.reader(input)

CodePudding user response:

It is not an inconsistent quotes. If a value in a CSV file has comma or new line in it quotes are added to it. It shoudn't be a problem, because all standard CSV readers can read it properly.

  • Related