Hey there StackOverflow community,
I have a question regarding nested Avro schemas, and what would be a best practice on how to store them in the schema registry when using them with Kafka.
TL;DR & Question: What’s the best practice for storing complex, nested types inside an Avro schema registry?
- a) all subtypes as a separate subject (like demonstrated below)
- b) a nested supertype as a single subject, containing all subtypes
- c) something different altogether?
A little context: Our schema consists of a main type that has a few complex subtypes (with some of the subtypes themselves having subtypes). To keep things clean, we moved every complex type to its own *.avsc
file. This leaves us with ~10 *.avsc
Files. All messages we produce have the main type, and subtypes are never sent separately.
For uploading/registering the schema, we use a gradle plugin. In order for this to work, we need to fully specify every subtype as a separate subject, and then define the references between them, like so (in build.gradle.kts
):
schemaRegistry {
url.set("https://$schemaRegistryPath")
register {
subject("SubSubType1", "$projectDir/src/main/avro/SubSubType1.avsc", "AVRO")
subject("SubType1", "$projectDir/src/main/avro/SubType1.avsc", "AVRO")
.addReference("SubSubType1","SubSubType1",-1)
subject("MyMainType", "$projectDir/src/main/avro/MyMainType.avsc", "AVRO")
.addReference("SubType1","SubSubType1",-1)
// remaining config omitted for brevity
}
}
This results in all subtypes being registered in the schema registry as a separate subject:
curl -X GET http://schema-registry:8085/subjects
["MyMainType","Subtype1","Subtype2","Subtype3","SubSubType1","SubSubType2"]%
This feels awkward; We only ever produce Kafka messages with a payload of MyMainType
- therefore I only need to have that type in the registry, with all subtypes nested in, like so:
curl -X GET http://schema-registry:8085/subjects
["MyMainType"]%
It appears as if this isn't possible with this particular Gradle plugin, however it looks like other plugins handle this identically. So apparently when having Avro subtypes specified in separate files the only way to register them is by registering them as separate subjects.
What should I do here? Register all subtypes, or merge all *.avsc
into one big file?
Thanks for any pointers everybody!
CodePudding user response:
Unfortunately, there doesn't seem to be a whole lot of information available on this topic, but this is what I found out regarding your options with complex Avro schemas:
- for simple schemas with few complex types, use Avro Schemas (
*.avsc
) - for more complex schemas and loads of nesting, use Avro Interface Definitions (
*.avdl
) - these natively support imports
So it would probably be worthwhile to convert the definitions to *.avdl
. In case you insist on keeping your *.avsc
style definitions, there are Maven plugins available for merging these (see https://michalklempa.com/2020/04/composing-avro-schemas-from-subtypes/).
However, the impression that I get is that whenever things get complex, it would be preferable to use Avro IDL. This blog post supports this hypothesis.