Home > Software engineering >  Am I correct that Elasticsearch simple_query_string does not support wildcards, but only prefix quer
Am I correct that Elasticsearch simple_query_string does not support wildcards, but only prefix quer

Time:08-04

There are a number of related questions on stackoverflow, but they mostly suggest other ways to do wildcards. I'm trying to analyze an existing install, so alternatives are not useful.

I think the issue I ran into was that simple_query_string and wildcard queries do different things with infix *.

the query

r*g

expands to " msg:r msg:g" with simple_query_string:

GET /test/_validate/query?rewrite=true
{
  "query": {
    "simple_query_string" : {
        "query": "r*g",
        "fields": ["msg"],
        "default_operator": "AND"
    }
  }
}

Returns

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "test",
      "valid": true,
      "explanation": " msg:r  msg:g"
    }
  ]
}

Which shows that simple query string is not treading this as a wildcard at all. Not even for r* So, it will not match "running", for example.

On the other hand, wildcard query does handle infix.

GET /test/_validate/query?rewrite=true
{
  "query": {
    "wildcard": {
      "msg": {
        "value": "r*g",
        "case_insensitive": true
      }
    }
  }
}

returns

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "test",
      "valid": true,
      "explanation": """msg:AutomatonQuery {
org.apache.lucene.util.automaton.Automaton@78b3b2e7}"""
    }
  ]
}

While the automaton query could use better output, r*g as a wildcard query will match "running", but a simple_query_string will not.

So, am I correct that the sample query string matches very different sets for simple_query_string vs. wildcard query?

CodePudding user response:

Yes, You are right. simple_query_string only support wildcard in end of the query string.

* at the end of a term signifies a prefix query

You can see below line in documentation so it is ignored * in your scenario.

the simple_query_string query does not return errors for invalid syntax. Instead, it ignores any invalid parts of the query string.

You can use query_string of query but it is strict and validate query syntax.

You can use the query_string query to create a complex search that includes wildcard characters, searches across multiple fields, and more. While versatile, the query is strict and returns an error if the query string includes any invalid syntax.

  • Related