Home > OS >  How to save pandas textmanipulation as csv in the correct form
How to save pandas textmanipulation as csv in the correct form

Time:07-07

I have a *.txt file with numbers. I want to eliminate the spaces. The raw data looks like this

12 12345 1234
23 23456 234

If I use the following

data=data[0].str.replace(" ","")
data.update('\''   data   '\',')

I get

 '1234123451234',
 '2323456234',

which I want. But if I save it to csv with

data.to_csv("/Users/k/file.txt", header=None, index=None, mode='a')

I get as file values:

 "'1234123451234',"
 "'2323456234',"

If I use the quoating = csv.None or 3 (same)

data.to_csv("Users/k/file.txt", header=None, index=None, quoting=3, escapechar="\\", mode='a')

The file looks like:

 '1234123451234'\,
 '2323456234'\,

Just using space or nothing as escapechar does not work.

If I just remove the spaces without adding quotes or commas and then save via:

data.to_csv("Users/k/file.txt", header=None, index=None, mode='a', quoting=1, sep=",")

I get:

 "1234123451234"
 "2323456234"

missing the comma.

Adding only the comma and saving as above gets me

 "1234123451234,"
 "2323456234,"

wrong place :-)

As you can see, I am getting mad over missing my target by inches, while it is most likely super easy. I probably will switch to regex :-)

CodePudding user response:

The output is expected. As you manually added ' to the strings, they are retained in the output. There's no need to manually add quotation marks at all, just set the correct options for to_csv:

>>> df = pd.DataFrame(["12 12345 1234", "23 23456 234"])
>>> df
               0
0  12 12345 1234
1   23 23456 234

>>> df[0] = df[0].str.replace(" ", "")
>>> df
             0
0  12123451234
1   2323456234

>>> df.to_csv("output.csv", quoting=1, quotechar="'", line_terminator=',\n')

# output.csv
'','0',
'0','12123451234',
'1','2323456234',

CodePudding user response:

I would suggest not using Pandas at all, given that your transformation is pretty simple.

Something like this should work:

with open('input.txt', 'rt') as f_in:
    with open('output.txt', 'wt') as f_out:
        for line in f_in:
            line = line.strip()
            line = line.replace(' ', '')
            line = '\''   line   '\',\n'
            f_out.write(line)
  • Related