Home > Mobile >  elasticsearch - how to combine results from two indexes
elasticsearch - how to combine results from two indexes

Time:06-10

I have CDR log entries in Elasticsearch as in the below format. While creating this document, I won't have info about delivery_status field.

{
  msgId: "384573847",
  msgText: "Message text to be delivered"
  submit_status: true,
  ...
  delivery_status: //comes later
}

Later when delivery status becomes available, I can update this record.

But I have seen that update queries bring down the rate of ingestion. With pure inserts using bulk operations, I can reach upto 3000 or more transactions /sec, but if I combine with updates, the ingestion rate becomes very slow and crawls at 100 or less txns/sec.

So, I am thinking that I could create another index like below, where I store the delivery status along with msgId:

{
  msgId:384573847,
  delivery_status: 0
}

With this approach, I end up with 2 indices (similar to master-detail tables in an RDBMS). Is there a way to query the record by joining these indices? I have heard of aliases, but could not fully understand its concept and whether it can be applied in my use case.

thanks to anyone helping me out with suggestions.

CodePudding user response:

As you mentioned, you can index both the document in separate index and used collapse functionality of Elasticsearch and retrieve both the documents.

Let consider, you have index document in index2 and index3 and both have common msgId then you can use below query:

POST index2,index3/_search
{
  "query": {
    "match_all": {}
  },
  "collapse": {
    "field": "msgId",
    "inner_hits": {
      "name": "most_recent",
      "size": 5
    }
  }
}

But again, you need to consider querying performance with large data set. You can do some benchmarking Evalue query performance and decide index or query time will be better.

Regarding alias, currently in above query we are providing index2,index3 as index name. (Comma separated). But if you use aliases then You can use the single unified name for query to both the index.

You can add both the index to single alias using below command:

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "index3",
        "alias": "order"
      }
    },
    {
      "add": {
        "index": "index2",
        "alias": "order"
      }
    }
  ]
}

Now you can use below query with alias name insted of index name:

POST order/_search
{
  "query": {
    "match_all": {}
  },
  "collapse": {
    "field": "msgId",
    "inner_hits": {
      "name": "most_recent",
      "size": 5
    }
  }
}
  • Related