I'm currently trying to read a csv file, add/delete/rename some columns using Pandas dataframe, and then write it back to the same file. However, a few of the files I'm using contain records which look like this:
Column 1,Column 2,Column 3,Column 4
123," ",ABCD,"This, that, and this"
I was able to successfully get Pandas to understand "This, that, and this" and write it back to the csv exactly the same with the quotation marks and commas. But unfortunately I can't seem to get the empty space surrounded by quotations " ". It will just write it back like this:
Column 1,Column 2,Column 3,Column 4
123, ,ABC,"This, that, and this"
My read looks like:
f = pd.read_csv((mypath file), skipinitialspace=True, quotechar='"')
And my write looks like:
f.to_csv((mypath file), index=False)
CodePudding user response:
It's still a valid CSV without the quotes around the space. A field only needs to be quoted if it contains the separator (comma).
You don't need skipinitalspace=True
on the read. That's only needed if the CSV looks like A, B, C, D
instead of A,B,C,D
.
The quotechar
parameter isn't needed either as "
is the default.
The best you can do with pandas
is quote everything if you need it:
import pandas as pd
import csv
f = pd.read_csv('sample.csv')
f.to_csv('out.csv',index=False, quoting=csv.QUOTE_ALL)
Given: sample.csv
Column 1,Column 2,Column 3,Column 4
123," ",ABCD,"This, that, and this"
Result: out.csv
"Column 1","Column 2","Column 3","Column 4"
"123"," ","ABCD","This, that, and this"
CodePudding user response:
as far as I know, you can use quoting in to_csv to quote all strings
import csv
import pandas as pd
f = pd.read_csv(('test.csv'), skipinitialspace=True, quotechar='"')
f.to_csv(('test2.csv'), quoting=csv.QUOTE_NONNUMERIC, index=False)
and the output like this
"Column 1","Column 2","Column 3","Column 4"
123," ","ABCD","This, that, and this"
or you can replace blank space with '" "' but the output will contain triple quote like """"""
f = pd.read_csv(('test.csv'), skipinitialspace=True, quotechar='"')
f["Column 2"] = f["Column 2"].replace(r'^\s*$', '\" \"', regex=True)
f.to_csv(('test2.csv'), index=False)
and the output like this
Column 1,Column 2,Column 3,Column 4
123,""" """,ABCD,"This, that, and this"