As the title states, I'm using debezium Postgres source connector and I would like MongoDB sink connector to group kafka topics in different collection and databases (also different dbs to isolate unrelated data) according to their names. While inquiring I came across with topic.regex
connector property at mongo docs. Unfortunately, this only creates a collection in mongo for each kafka topic successfully matched against the specified regex, and I'm planning on using the same mongodb server to harbor many dbs captured from multiple debezium source connectors. Can you help me?
Note: I read this mongo sink setting FieldPathNamespaceMapper
, but I'm not sure if it would fit my needs nor how to correctly configure it.
CodePudding user response:
topics.regex
is a general sink connector peppery, not unique to Mongo.
If I understand the problem, correctly, obviously only collections will get created in the configured database for Kafka topics that actually exist (match the pattern) and get consumed by the sink.
If you want collections that don't match a pattern, then you'll still need to consume them, but need to explicitly rename the topics via RegexRouter transform before records are written to Mongo
CodePudding user response:
In kafka workers are simple containers that can run multiple connectors. For each connector workers generate tasks according to internal rules and your configurations. So, if you take a look at mongodb sink connector configurations:
You can create different connectors with the same connection.uri, database and collection, or different values. So you might use the topics.regex or topics parameters to group the topics for a single connector with its own connection.uri, database and collection, and run multiple connectors at the same time. Remember that if tasks.max > 1 in your connector, messages might be read out of order. If this is not a problem, set a value of tasks.max next to the number of mongodb shards. The worker will adjust the number of tasks automatically.