I have a python3 "file-like object" whose read()
method returns a string - it comes from either an opened file or an object streamed from s3
using boto3
.
I want to sanitize the stream before passing it to csv.DictReader
, in particular because that module barfs on NUL
characters on the input.
The CSV files I'm processing may be large, so I want to do this "streaming", not reading the entire file/object into memory.
How do I wrap the input object so that I can clean up every string returned from read()
with a call like: .replace('\x00', '{NUL}')
?
I think that the io
library is where to look, but I couldn't find something that obviously did what I want - to be able to intercept and transform every call to .read()
on the underlying file-like object and pass the wrapper to csv
, without reading the whole thing at once.
CodePudding user response:
You can use a simple generator function that fixes the data before passing it on to csv.reader
:
import io
import csv
def denull(line_gen):
for line in line_gen:
yield line.replace('\x00', '{NUL}')
data = io.StringIO("""
hello;world
asdf;h\x00pla
""".strip())
for row in csv.reader(denull(data), delimiter=";"):
print(row)
prints out
['hello', 'world']
['asdf', 'h{NUL}pla']