Home > Software design >  Filter result in memory to search in elasticsearch from multiple indexes
Filter result in memory to search in elasticsearch from multiple indexes

Time:07-06

I have 2 indexes and they both have one common field (basically relationship).

Now as elastic search is not giving filters from multiple indexes, should we store them in memory in variable and filter them in node.js (which basically means that my application itself is working as a database server now).

We previously were using MongoDB which is also a NoSQL DB but we were able to manage it through aggregate queries but seems the elastic search is not providing that.

So even if we use both databases combined, we have to store results of them somewhere to further filter data from them as we are giving users advanced search functionality where they are able to filter data from multiple collections.

So should we store results in memory to filter data further? We are currently giving advanced search in 100 million records to customers but that was not having the advanced text search that elastic search provides, now we are planning to provide elastic search text search to customers.

What do you suggest should we use the approach here to make MongoDB and elastic search together? We are using node.js to serve data.

Or which option to choose from

  1. Denormalizing: Flatten your data
  2. Application-side joins: Run multiple queries on normalized data
  3. Nested objects: Store arrays of objects
  4. Parent-child relationships: Store multiple documents through joins

https://blog.mimacom.com/parent-child-elasticsearch/

https://spoon-elastic.com/all-elastic-search-post/simple-elastic-usage/denormalize-index-elasticsearch/

CodePudding user response:

Storing things client side in memory is not the solution. First of all the simplest way to solve this problem is to simply make one combined index. Its very trivial to do this. Just insert all the documents from index 2 into index 1. Prefix all fields coming from index-2 by some prefix like "idx2". That way you won't overwrite any similar fields. You can use an ingestion pipeline to do this, or just do it client side. You only will ever do this once.

After that you can perform aggregations on the single index, since you have all the data in one-index.

If you are using somehting other than ES as your primary data-store you need to reconfigure the indexing operation to redirect everything that was earlier going into index-2 to go into index-1 as well(with the prefixed terms).

100 million records is trivial for something like ELasticsearch. Doing anykind of "joins" client side is NOT RECOMMENDED, as this will obviate the entire value of using ES.

If you need any further help on executing this, feel free to contact me. I have 11 years exp in ES. And I have seen people struggle with "joins" for 99% of the time. :)

The first thing to do when coming from MySQL/PostGres or even Mongodb is to restructure the indices to suit the needs of data-querying. Never try to work with multiple indices, ES is not built for that.

HTH.

  • Related