Home > Software design >  Elasticsearch query by value in array
Elasticsearch query by value in array

Time:06-30

I got the following document indexed in ES6:

{
  "id": 1234,
  ...,
  "images": [
    {
      "id": 1703805,
      ...,
      "language_codes": [],
      "ingest_source_ids": [123]
    },
    {
      "id": 2481938,
      ...,
      "language_codes": ["EN"],
      "ingest_source_ids": [1,2,3]
    }
  ]
}

The images object is mapped as nested.

I can find the document just fine using this query:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.ingest_source_ids": 123
        }
      }
    }
  }
}

But if I instead wanna find via languages_codes I do not find document:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.language_codes": "EN"
        }
      }
    }
  }
}

ingest_source_ids has been in the documents since day one. The language_codes field has been added later. I do recall something about Elasticsearch doing some magic mapping with the initial documents, but on the other hand as far as I can read in the documentation, there's no special mapping needed for arrays - all fields can contain arrays as long as all keys are same type.

In this case it works fine with all keys being numeric in ingest_source_ids, but language_codes are also always strings, so should be same case.

What am I missing?

CodePudding user response:

If you have not explicitly defined any index mapping for language_codes, then by default it will be indexed as :

 "language_codes": {
                            "type": "text",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }

Considering that you are using the term query, you must utilize this query on the keyword type field in order for the query term to match the exact term documents.

Replace your query with:

{
  "query": {
    "nested": {
      "path": "images",
      "query": {
        "term": {
          "images.language_codes.keyword": "EN"
        }
      }
    }
  }
}

  • Related