Home > other >  ISO 8601 dates in Avro schema
ISO 8601 dates in Avro schema

Time:11-24

Is it possible to use date-time fields such as "2019-08-24T14:15:22.000Z" in Avro?

The docs says that one needs to use type int/long with logical type for dates/timestamps. But in this case, you need to have your date as an epoch timestamp.

I'm looking for sth like this:

{
    "name": "myDateField",
    "type": "string",
    "logicalType": "timestamp-micros"
}

but it seems that logicalType is ignored in this case and it becomes possible to set any random string in that field.

CodePudding user response:

The idea of the logical types is that the library you are using will do the conversion for you.

Assume you had a schema like this:

{
    "type": "record",
    "name": "root",
    "fields": [
        {
            "name": "mydate",
            "type": {
                "type": "int",
                "logicalType": "date",
            },
        },
    ]
}

If you wanted to use this schema in Python (for example), you would create a record like so:

from datetime import date
record = {"mydate": date(2021, 11, 19)}

The avro library you are using is responsible for taking the date object, doing the conversion to figure out how to represent it correctly as the underlying int type, and then serializing it as an int.

Likewise, when reading that record back out, the library is responsible for first converting the underlying int back into the date object. From a user perspective, you don't have to worry about the conversion and simply get to use higher level types.

CodePudding user response:

Assuming you have a simple Pojo:

public class AvroEvent {
  public ZonedDateTime time;
}

You could use an Avro logical type conversion:

public class ZonedDateTimeConversion extends Conversion<ZonedDateTime> {
  public Class<ZonedDateTime> getConvertedType() {
    return ZonedDateTime.class;
  }

  public String getLogicalTypeName() {
    return "zoneddatetime-string";
  }

  public Schema getRecommendedSchema() {
    return new ZonedDateTimeString().addToSchema(Schema.create(Schema.Type.STRING));
  }

  public ZonedDateTime fromCharSequence(CharSequence value, Schema schema, LogicalType type) {
    return ZonedDateTime.parse(value, DateTimeFormatter.ISO_ZONED_DATE_TIME);
  }

  public CharSequence toCharSequence(ZonedDateTime value, Schema schema, LogicalType type) {
    return value.format(DateTimeFormatter.ISO_ZONED_DATE_TIME);
  }

  public static class ZonedDateTimeString extends LogicalType {
    private ZonedDateTimeString() {
      super("zoneddatetime-string");
    }

    public void validate(Schema schema) {
      super.validate(schema);
      if (schema.getType() != Schema.Type.STRING) {
        throw new IllegalArgumentException(
            "ZonedDateTime (string) can only be used with an underlying string type");
      }
    }
  }
}

And add that to an Avro model to use it for serialising and deserializing your Pojo:

    var model = new ReflectData();
    model.addLogicalTypeConversion(new ZonedDateTimeConversion());

    var schema = model.getSchema(AvroEvent.class);

    var encoder = new BinaryMessageEncoder<AvroEvent>(model, schema);
    
    var data = encoder.encode(...);

So you can only write valid valid to the serialisation and throw an exception when deserializing an invalid time.

See https://github.com/fillmore-labs/avro-logical-type-conversion for a running example.

  • Related