I have a comma delimited file which also contains commas in the actual field values, something like this:
foo,bar,"foo, bar"
This file is very large so I am wondering if there is a way in python to either put double quotes around ever field:
eg: "foo","bar","foo, bar"
or just change the delimeter overall?
eg: foo|bar|foo, bar
End goal: The goal is to ultimately load this file into sql server. Given the size of the file bulk insert is only feasible approach for loading but I cannot specify a text qualifier/field quote due to the version of ssms I have.
This leads me to believe the only remaining approach is to do some preprocessing on the source file.
CodePudding user response:
Changing the delimiter just requires parsing and re-encoding the data.
with open("data.csv") as input, open("new_data.csv", "w") as output:
r = csv.reader(input, delimiter=",", quotechar='"')
w = csv.writer(output, delimiter="|")
w.writerows(r)
Given that your input file is a fairly standard version of CSV, you don't even need to specify the delimiter and quote arguments to reader
; the defaults will suffice.
r = csv.reader(input)
CodePudding user response:
It is not an inconsistent quotes. If a value in a CSV file has comma or new line in it quotes are added to it. It shoudn't be a problem, because all standard CSV readers can read it properly.