Home > Net >  Python3 streaming string replacement
Python3 streaming string replacement

Time:11-03

I have a python3 "file-like object" whose read() method returns a string - it comes from either an opened file or an object streamed from s3 using boto3.

I want to sanitize the stream before passing it to csv.DictReader, in particular because that module barfs on NUL characters on the input.

The CSV files I'm processing may be large, so I want to do this "streaming", not reading the entire file/object into memory.

How do I wrap the input object so that I can clean up every string returned from read() with a call like: .replace('\x00', '{NUL}')?

I think that the io library is where to look, but I couldn't find something that obviously did what I want - to be able to intercept and transform every call to .read() on the underlying file-like object and pass the wrapper to csv, without reading the whole thing at once.

CodePudding user response:

You can use a simple generator function that fixes the data before passing it on to csv.reader:

import io
import csv


def denull(line_gen):
    for line in line_gen:
        yield line.replace('\x00', '{NUL}')


data = io.StringIO("""
hello;world
asdf;h\x00pla
""".strip())

for row in csv.reader(denull(data), delimiter=";"):
    print(row)

prints out

['hello', 'world']
['asdf', 'h{NUL}pla']
  • Related