Home > Net >  How the index updates works for Solr and Elasticsearch?
How the index updates works for Solr and Elasticsearch?

Time:02-28

I have an application that is using Event Sourcing and Command Query Responsiblity Seggragation Pattern. Development of the Command part is complete and I have to decide how should I implement the Query part.

My system deals with customer orders, so when event arrives for an order, that order processed with orderId and order payload. The thing is, in this form only whay to query the orders is over orderId so I can't ask a question like give me all the order in the system with status OPEN.

For this part I have to use the query part, my potential technology implementations for the query part, a classical solution like PostGre DB or more elegant way in my opinion Solr/Elasticsearch.

I have a basic knowledge/experience about Solr/Elasticsearch and I want to use this opurtunity to learn more but here comes my dilema. Some other department in our company is already working with Elasticsearch and a colleage from that deperatment told me, updates in elasticsearch is not a good idea, I didn't quite understand his argumentation, so I like to ask here what I am planning to do so you can tell me, it is a bad idea or Solr is better suited for it.

I am planning every status change for my order to send as an update for Elasticsearch, so it will look like the following.

id Status Customer Items
orderId1 -> order.SUBMITTED order.Customer order.Items
orderId1 -> order.CHANGED order.Customer1 order.Items
orderId1 -> order.PROCESSING order.Customer1 order.Items
orderId1 -> order.ON_DELIVERY order.Customer1 order.Items
orderId1 -> order.COMPLETE order.Customer1 order.Items

As you see, I have to send several updates for orderId, to Elasticsearch/Solr.

So my colleague told me, Indexed Documents in Elasticsearch are immutables, when I send order.SUBMITTED Event to be indexed, it will create the document but order.CHANGED Event will not update the document but create another one. Now I can't quite judge the consequence of this, for my Business Case (I will ask orders of my Customer1 and I will see Status SUBMITTED and CHANGED, 2 records as query response) or operational (additional load and storage).

Did I understand correctly the behaviour of Eleasticsearch? If yes, will Solr behave any different?

If understood correctly an both will behave same, can I design anything differently that it would help reach my goals.

Finally I have no problem using PostGre for this solution, I just tough Elasticsearch or Solr would be a more natural choice for this problem. What do you think?

Thx for answers.

CodePudding user response:

You colleague is partially correct, about the costly updates in Elasticsearch(ES) and updates being immutable, but it doesn't mean ES is not suitable for system with frequent updates, in fact due to its scalability and distributed nature its preferred choice and being used in high-throughput and low latency systems(including the search systems). There are few misconception you have, and I would try to explain them.

  1. Both ES and Solr are based on Lucene, and costly updates or immutable updates are the property of Lucene, so it doesn't matter whether you choose ES or Solr, you will underlying using Lucene and will have same update mechanism.
  2. Updates are immutable it doesn't mean that your old status of Order will always be in the index, So for example initially your order status is SUBMITTED and later you update it to CHANGED, so even its immutable but when you query the order status, you will get the latest status(if refresh Happened on the index, default is 1 sec in ES), Apart from permanent deletion of old documents(Happens during the merge process, explained in #3), ES marks old document as deleted(soft delete by updating a boolean flag delete, on updation of document), due to this during your search these soft deleted documents are not returned.
  3. ES periodically deletes the old document, so in your case order status SUBMITTED will be deleted from index during merge process, so that old documents are deleted, and your index size doesn't grow.

Also its very important to understand, that this immutable updates provides a huge benefit to improve the search/read performance as now these segments(which contains the documents in ES) can be used in multi-threading env as well as can be cached due to immutability reasons.

  • Related