How do I keep extra whitespaces from columns while writing a dataframe to a file?
Say I have this dataframe, while printing the dataframe I can see the whitespaces. to_csv()
from pandas, however, strips the whitespaces while writing to file. Whitespaces are even preserved, if I write the dataframe to file as string with file.write(str(df))
, but this does not maintain the tabular format because the length of string in a column is long.
a | b | c | d
------------------------------
10 | 1 | 100 | d
20 | 2 | 10 | d
The number of whitespaces varies in each rows because it depends on the column value. For example, for column c
, second row has one more whitespace than the first row because the value in the first row is one char longer.
For context, I am reading the dataframe from a file, with pd.read_csv()
the whitespaces are preserved. After manipulating the dataframe, I again want to write it to another file, but to_csv()
isn't preserving whitespaces. Other variant such as np.savetxt()
or any other approach is okay.
Edit: I found that columns have tab
instead of spaces, how do I deal with this?
CodePudding user response:
This is by design. Strictly speaking, the CSV format mandates that no extra space occurs between the delimiter and the field - even if most readers can be configured to ignore them. But as to_csv
writes conformant files, it will never write extra spaces.
What you want is not a csv file but a Fixed Width Fields file. You can generate one with to_string
with df.to_string(filename)
. It will be close to what you obtain when printing the dataframe on your screen, but its doc references a number of parameters, for example col_space
can be used to fix the minimum width for the columns:
df.to_string(filename, col_space=[5, 5, 8, 5], index=None)
will write it with no index and ensure that colums a, b, and d and at least 5 characters wide, and c at least 8 characters wide.