I'm trying to read Avro files stored in S3 by a vendor and write to a DW. See code below. (Was roughly working from this S/O thread.)
obj = obj.get()
raw_bytes = obj["Body"].read()
avro_bytes = io.BytesIO(raw_bytes)
reader = DataFileReader(avro_bytes, DatumReader())
The code is tripped up at the last line, where I get the error:
AttributeError: '_io.StringIO' object has no attribute 'mode'
That error comes from this spot in the source code, where DataFileReader
is initialized.
def __init__(self, reader: IO[AnyStr], datum_reader: avro.io.DatumReader) -> None:
if "b" not in reader.mode:
warnings.warn(avro.errors.AvroWarning(f"Reader binary data from a reader {reader!r} that's opened for text"))
bytes_reader = getattr(reader, "buffer", reader)
I've tried using avro_bytes
as StringIO
as well to see if that would help, but it didn't.
Any ideas how to get past that AttributeError
?
CodePudding user response:
This is a bug in version 1.11.0 that has been fixed but a new version hasn't been released: https://issues.apache.org/jira/browse/AVRO-3252.
To resolve this, you can do one of the following:
- Wait until the new version is released
- Patch the call so that it doesn't do that check
- Instead of using
BytesIO
you could make your own wrapper object that mimicsBytesIO
but has amode
attribute.