Home > Software engineering >  Why is Snowflake having trouble parsing this Avro file?
Why is Snowflake having trouble parsing this Avro file?

Time:07-28

I'm trying to bulk-load data into Snowflake with an Avro file. I'm able to create the file and PUT it to a Snowflake stage, but when I run the command

copy into MyTable from '@%"MyTable"/LinkTypes.avro' file_format = (type = avro) match_by_column_name = case_insensitive;

it errors out on me:

100084 (22P02): Error parsing AVRO: not an AVRO file (bad magic)

Looking at this file in a hex editor, I don't notice anything particularly bad about it. I'm able to read it with the same library that created it (Microsoft.Avro.Core from NuGet) and it loads just fine in this random online Avro viewer I googled up. But for whatever reason, Snowflake can't read it.

The file can be found here. My Avro data generation routine is below. Anyone have any idea what's wrong with it and how to fix it?

public void Write(string name, IDataReader data, DataDictionary dict)
{
    using var stream = OpenFileStream(name); // returns a FileStream; assume this works
    var schema = BuildAvroSchema(dict.GetSchema(name)); // returns a JSON string describing the schema; assume this works
    using var writer = AvroContainer.CreateGenericWriter(schema, stream, Codec.Deflate);
    var serializer = AvroSerializer.CreateGeneric(schema);
    WriteReader(reader, writer, stream, serializer.WriterSchema);
}

private static void WriteReader(IDataReader reader, IAvroWriter<object> writer, Stream stream, TypeSchema schema)
{
    using var sw = new SequentialWriter<object>(writer, 1024);
    var rec = new AvroRecord(schema);
    var len = reader.FieldCount;
    var buffer = new object[len];
    var names = Enumerable.Range(0, len).Select(i => reader.GetName(i)).ToArray();
    while (reader.Read()) {
        reader.GetValues(buffer);
        for (int i = 0; i < len;   i) {
            var value = buffer[i];
            if (value is DateTime dt) {
                value = dt.Ticks;
            }
            rec[names[i]] = value == DBNull.Value ? null : value;
        }
        sw.Write(rec);
    }
}

CodePudding user response:

I uploaded you avro file into a stage, and I'm able to load the data without any error.

create table testavro (id number, type varchar);
    
copy into testavro from @mystage/LinkTypes.avro file_format = (type = avro) match_by_column_name = case_insensitive;
    
select * from testavro;

 ---- ----------- 
| ID |   TYPE    |
 ---- ----------- 
|  1 | Linked    |
|  3 | Duplicate |
 ---- ----------- 

ls @mystage;

 --------------------------- ------ ---------------------------------- ------------------------------- 
|           name            | size |               md5                |         last_modified         |
 --------------------------- ------ ---------------------------------- ------------------------------- 
| mystage/LinkTypes.avro.gz |  208 | 14747c0a8e9ab98cbd942feef5ef2702 | Wed, 27 Jul 2022 15:03:48 GMT |
 --------------------------- ------ ---------------------------------- ------------------------------- 

So I think you should re-upload your file.

  • Related