I have a fixed width text file that I am trying to read in using pandas.read_fwf. As noted here, this method removes leading and trailing whitespace. In order to get around that, I'd like to replace every whitespace character with some filler character, read the file in as a Dataframe, do my manipulation and editing, restore each filler character to a whitespace, and write the file back out as a text file.
First I manually replaced the whitespace with the tilde character (~) and then manually removed it at the end just using notepad and find/replace, but this is slow and definitely something Python should be able to do for me.
My current method is convoluted, but it does work. I essentially read the file in, make the whitespace replacements, write it out to a temp file, then read it back in to pandas as a fixed width file. Same thing in the opposite direction at the end of my program.
Reading stage (replacing whitespace with ~):
with open("input.txt") as inFile:
txt1 = inFile.read().replace(" ", "~")
with open("input_temp.txt", 'w') as outFile:
outFile.write(txt1)
with open("input_temp.txt") as inFile:
df = pandas.read_fwf(inFile, widths=[8, 8, 8])
Writing stage (replacing ~ with whitespace):
with open("output_temp.txt", 'w') as outFile:
np.savetxt(outFile, df.values, fmt='%s', delimiter='')
with open("output_temp.txt") as inFile:
txt2 = inFile.read().replace("~", " ")
with open("output.txt", 'w') as outFile:
outFile.write(txt2)
Efficiency/memory isn't a huge concern, but I would still like a better way of doing this.
CodePudding user response:
You can use io.StringIO
as a file-like object to read from
import io
with open("input.txt") as inFile:
txt1 = io.StringIO(inFile.read().replace(" ", "~"))
df = pandas.read_fwf(txt1, widths=[8, 8, 8])
and to write to
out_text = io.StringIO()
np.savetxt(out_text, df.values, fmt='%s', delimiter='')
txt2 = out_text.getvalue().replace("~", " ")
with open("output.txt", 'w') as outFile:
outFile.write(txt2)