Home > other >  Keep extra whitespaces while writing pandas dataframe to file
Keep extra whitespaces while writing pandas dataframe to file

Time:12-17

How do I keep extra whitespaces from columns while writing a dataframe to a file?

Say I have this dataframe, while printing the dataframe I can see the whitespaces. to_csv() from pandas, however, strips the whitespaces while writing to file. Whitespaces are even preserved, if I write the dataframe to file as string with file.write(str(df)), but this does not maintain the tabular format because the length of string in a column is long.

a   |   b   |     c        | d
------------------------------
10  |   1   |     100      | d
20  |   2   |     10       | d

The number of whitespaces varies in each rows because it depends on the column value. For example, for column c, second row has one more whitespace than the first row because the value in the first row is one char longer.

For context, I am reading the dataframe from a file, with pd.read_csv() the whitespaces are preserved. After manipulating the dataframe, I again want to write it to another file, but to_csv() isn't preserving whitespaces. Other variant such as np.savetxt() or any other approach is okay.

Edit: I found that columns have tab instead of spaces, how do I deal with this?

CodePudding user response:

This is by design. Strictly speaking, the CSV format mandates that no extra space occurs between the delimiter and the field - even if most readers can be configured to ignore them. But as to_csv writes conformant files, it will never write extra spaces.

What you want is not a csv file but a Fixed Width Fields file. You can generate one with to_string with df.to_string(filename). It will be close to what you obtain when printing the dataframe on your screen, but its doc references a number of parameters, for example col_space can be used to fix the minimum width for the columns:

df.to_string(filename, col_space=[5, 5, 8, 5], index=None)

will write it with no index and ensure that colums a, b, and d and at least 5 characters wide, and c at least 8 characters wide.

  • Related