I am new to kafka connector. I have been explore about it about a week. I have used create and update the mongodb via mongodb connector curl commands. I am bit struggling to understand the concept and implementation of below.
- We are registering curl command to connector at every time with unique name before producing the message. How it will be automated?. For example, If I pass the data from my application to producer should I call the curl command for each and every request?
2)I need to maintain the history collection based on that I need to pass two collection and two topics (one for updating and one for creating). How will I manage with curl configuration. I will paste my curl update configuration below,
curl -X POST -H "Content-Type: application/json" -d '{"name":"test-students-update",
"config":{"topics":"topicData",
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max":"1",
"connection.uri":"mongodb://localhost:27017",
"database":"quickstart",
"collection":"topicData",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"org.apache.kafka.connect.storage.StringConverter",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy",
"document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"value.projection.list":"tokenNumber",
"value.projection.type":"whitelist",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.UpdateOneTimestampsStrategy"
}}' localhost:8083/connectors
CodePudding user response:
Not sure what do you mean by automating CURL command for your MongoDB Sink Connector, and what is the need of running CURL command every time. Kindly clarify.
The existing MongoDB Sink connector can easily be integrated with the Confluent Hub and can run as a standalone service to serve the purpose of data UPSERT.
You can have a look at https://www.mongodb.com/docs/kafka-connector/current/sink-connector/fundamentals/#std-label-kafka-sink-fundamentals
CodePudding user response:
every time with unique name before producing the message
This is not necessary. Post the connector once and it'll start a Kafka consumer and wait for data, just like any other Kafka client.
pass the data from my application to producer should I call the curl command for each and every request
As stated, no.
How it will be automated
You don't necessarily need to use curl. If you're using Kubernetes, there are CRDs for KafkaConnect. Otherwise, Terraform providers work with Connect API as well. Or you can continue to use curl in some ci/cd pipeline, but it only needs ran once to start the connector
need to pass two collection and two topics (one for updating and one for creating).
The collection
field in the connector can only reference one collection. Therefore, you'd need two separate connectors, and therefore all Kafka events would be inserted or updated to those individual collections, and not reference on another unless your schema model uses ObjectId references
Alternatively, redesign your producer to send to one topic, then inserts and updates (and deletes) can happen based on the key of the record into one collection