Home > database >  Microservices: Best Practice for combining data from Instances of SAME service?
Microservices: Best Practice for combining data from Instances of SAME service?

Time:07-14

Scenario:

We have two instances of the same microservice, which receives two events (pictured as Event1 and Event2 below) from Kafka. The instances need to combine the result of their own individual transformations, with that of the other, and send only 1 notification downstream.

I am trying to understand what is the best way to do all this. Specifically, how to make:

  • each instance of the microservice to wait for the other instance,
  • and then combine the individual transforms into one
  • and then check if the other instance has already combined and sent the notification, if yes, then skip!

Below diagram to help visualize:

Combining event data in multiple instances of same micro service

CodePudding user response:

There are multiple ways to get around such kind of data sync problems. But since you are using Kafka, you should be using out of box functionalities offered by Kafka.

Option 1 (Preferable)

Kafka guarantees to maintain the order of events within the same partition. Therefore if your producer could send these events to the same partition, they would be received by the same consumer (in your case, same instance - or if you are using threads as consumer, same thread of same consumer). With this you wouldn't need to worry about about syncing events across multiple consumers.

If you are using Spring Boot, this could be easily achieved by providing partition key in kafka template.

More on this topic : How to maintain message ordering and no message duplication

Option 2

Now, if you don't have control over producer, you would need to handle this at application side. You are going to need a distributed caching support, e.g. redis for this. Simply maintain the boolean state (completed: true OR false) for these events and only when all related events are received, process the downstream logic.

NOTE: Assuming you are using a persistence layer, combining and transforming the events should be trivial. But if you are not using any persistence, then you would need to use in-memory cache for Option1. For Option2, it should be trivial because you already have the distributed cache (and can store whatever you want).

CodePudding user response:

It's worth noting that you cannot guarantee all of:

  • a notification which needs to be sent will be sent in some finite period of time (this is a reasonable working definition of availability in this context)
  • no notification will be sent more than once
  • either instance can fail or there are arbitrary network delays

Fundamentally you will need each instance to tell the other one that it's claiming responsibility for sending the notification or ask the other instance if it's claimed that responsibility. If telling, then if it doesn't wait for acknowledgement you cannot guarantee "not more than once". If you tell and wait for acknowledgement, you cannot guarantee "will be sent in a finite period". If you ask, you will likewise have to decide whether or not to send in the case of no reply.

You could have the instances use some external referee: this only punts the CAP tradeoff to that referee. If you choose a CP referee, you will be giving up on guaranteeing a notification will be sent. If you choose AP, you will be giving up on guaranteeing that no notification gets sent more than once.

You'll have to choose which of those three guarantees you want to weaken; deciding how you weaken will guide your design.

CodePudding user response:

Consider using the temporal.io open source project to implement this. You can code your logic as a simple stateful class that reacts to the events. The idea of Temporal is that the instance of that class is not linked to a specific instance of the service. Each object is unique and identified by a business ID. So all the cross-process coordination is handled by the Temporal runtime.

Here is a sample code using Temporal Java SDK. Go, Typescript/Javascript, PHP, Python are also supported.

  @WorkflowInterface
  public interface CombinerWorkflow {
    @WorkflowMethod
    void combine();

    @SignalMethod
    void event1(Event1 name);

    @SignalMethod
    void event1(Event2 name);
  }

  // Define the workflow implementation which implements the getGreetings workflow method.
  public static class CombinerWorkflowImpl implements CombinerWorkflow {

    private Event1 event1;
    private Event2 event2;
    
    private Notifier notifier = Workflow.newActivityStub(Notifier.class); 
    
    @Override
    public void combine() {
      Workflow.await(()->event1 != null && event2 !=null);
      Event3 result = combine(event1, event2);
      notifier.notify(result);
    }

    @Override
    public void event1(Event1 event) {
      this.event1 = event;
    }

    @Override
    public void event1(Event2 event) {
      this.event2 = event;
    }
  }

This code looks too simple as it doesn't talk to persistence. But Temporal ensures that all the data and even threads are durably preserved as long as needed (potentially years). So any infrastructure and process failures will not stop its execution.

  • Related