Home > Software engineering >  Lucene vs Elasticsearch query syntax
Lucene vs Elasticsearch query syntax

Time:12-06

I can see that Elasticsearch support both Lucene syntax and it's own query language.

You can use both and get same kinds of results.

Example (might be done differently maybe but to show what I mean):

Both of these queries produce the same result but use Lucene or Elastic query syntax.

GET /index/_search 
{
  "query": { 
    "bool": { 
      "must": [ 
        {
            "query_string": {
            "query": "field101:Denmark"
          }
        }
      ]
    }
  }
}

GET /index/_search 
{
  "query": {
    "match": {
      "field101": {
       "query": "Denmark"
      }
    }
  }
}

I was wondering are there any kind of implications when choosing one approach over the other (like performance or some kinds of optimizations)? Or is Elastic query syntax just translated to Lucene query somewhere since Elastic runs Lucene as its underlying search engine ?

CodePudding user response:

I was wondering are there any kind of implications when choosing one approach over the other (like performance or some kinds of optimizations)?

Elasticsearch DSL will convert into Lucene query under the hood, you can set "profile":true in the query to see how that works and exactly how much time it takes to convert.

I would say there are no important performance implications and you should always use the DSL, because in many cases Elasticsearch will do optimizations for you. Also, query_string will expect well written Lucene queries, and you can have syntax errors (try doing "Denmark AND" as query_string.

Or is Elastic query syntax just translated to Lucene query somewhere since Elastic runs Lucene as its underlying search engine ?

Yes. You can try it yourself:

GET test_lucene/_search
{
  "profile": true,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "field101:Denmark"
          }
        }
      ]
    }
  }
}

will produce:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "profile": {
    "shards": [
      {
        "id": "[KGaFbXIKTVOjPDR0GrI4Dw][test_lucene][0]",
        "searches": [
          {
            "query": [
              {
                "type": "TermQuery",
                "description": "field101:denmark",
                "time_in_nanos": 3143,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 0,
                  "match": 0,
                  "next_doc_count": 0,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 0,
                  "advance_count": 0,
                  "score": 0,
                  "build_scorer_count": 0,
                  "create_weight": 3143,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 0
                }
              }
            ],
            "rewrite_time": 2531,
            "collector": [
              {
                "name": "SimpleTopScoreDocCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 1115
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

And

GET /test_lucene/_search 
{
  "profile": true, 
  "query": {
    "match": {
      "field101": {
       "query": "Denmark"
      }
    }
  }
}

Will produce the same

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "profile": {
    "shards": [
      {
        "id": "[KGaFbXIKTVOjPDR0GrI4Dw][test_lucene][0]",
        "searches": [
          {
            "query": [
              {
                "type": "TermQuery",
                "description": "field101:denmark",
                "time_in_nanos": 3775,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 0,
                  "match": 0,
                  "next_doc_count": 0,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 0,
                  "advance_count": 0,
                  "score": 0,
                  "build_scorer_count": 0,
                  "create_weight": 3775,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 0
                }
              }
            ],
            "rewrite_time": 3483,
            "collector": [
              {
                "name": "SimpleTopScoreDocCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 1780
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

As you see, times are in nanoseconds, not even miliseconds, that says conversion is fast.

You can read more about here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html

  • Related