Home > Mobile >  ElasticSearch: Search for all document with same value for all keys in flatten property
ElasticSearch: Search for all document with same value for all keys in flatten property

Time:11-29

Let's assume I have in elasticsearch 2 kinds of documents when "map" is of type flatten: 1.

doc1: {
"name": "foo1",
"map": {
  "key1": 100,
  "key2": 100
  }
}
doc2: {
"name": "foo2",
"map": {
  "key1": 100,
  "key2": 90
  }
}

Can I search elasticsearch to get all documents that their "map"'s properties(e.g. key1, key2) have the same value (e.g.) "100" for all their properties(key1=100, key2=100) so it will return doc1 without knowing in advance what properties exists under "map" property?

Thanks!

CodePudding user response:

Yes. There are actually 2 ways to achieve your goal:

  1. Adding a flag field to the documents via ingest pipeline, then running a regular filter against this new field (recommended)
  2. Generating the flag field on the fly via runtime fields

#1 Is the recommended way because iterating each document on each query doesnt scale well. Creating a flag field is much more efficient. Given your 2 documents:

POST test_script/_doc
{
  "name": "foo1",
  "map": {
    "key1": 100,
    "key2": 100
  }
}

POST test_script/_doc
{
  "name": "foo2",
  "map": {
    "key1": 100,
    "key2": 90
  }
}

1. Adding a flag field to the documents via ingest pipeline (recommended)

Create the ingest pipeline:

PUT _ingest/pipeline/is_100_field
{
  "processors": [
    {
      "script": {
        "source": "def keys_100 = 0;\ndef keys = ctx['map'].keySet();\n\nfor (key in keys) {\n    if(ctx['map'][key] == 100){\n        keys_100 = keys_100   1;\n    }\n}\n\nctx.is_100 = keys.size() == keys_100;",
        "ignore_failure": true
      }
    }
  ]
}

You can now reindex your data using this ingest pipeline, or configure to apply it on each document:

Reindex:

POST your_index/_update_by_query?pipeline=is_100_field

Ingestion

POST your_index/_doc?pipeline=is_100_field

This will generate the following document model

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test_script",
        "_id": "78_AvoQB5Gw0WET88nZE",
        "_score": 1,
        "_source": {
          "name": "foo1",
          "map": {
            "key1": 100,
            "key2": 100
          },
          "is_100": true
        }
      },
      {
        "_index": "test_script",
        "_id": "8s_AvoQB5Gw0WET8-HYO",
        "_score": 1,
        "_source": {
          "name": "foo2",
          "map": {
            "key1": 100,
            "key2": 90
          },
          "is_100": false
        }
      }
    ]
  }
}

Now you can run a regular filter which is the most efficient way:

GET test_script/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "is_100": true
          }
        }
      ]
    }
  }
}

Generating the flag field on the fly via runtime fields

The script is the same, but now the field will be generated on the fly instead of ingested in the data. We can add this field to the mappings, or to the query:

Mappings:

PUT test_script_runtime/
{
  "mappings": {
    "runtime": {
      "is_100": {
        "type": "boolean",
        "script": {
          "source": """
          def keys_100 = 0;
          def keys = params._source['map'].keySet();
          
          for (key in keys) {
              if(params._source['map'][key] == 100){
                  keys_100 = keys_100   1;
              }
          }
          
          emit(keys.size() == keys_100);
          """
        }
      }
    },
    "properties": {
      "map": {"type": "object"},
      "name": {"type": "text"}
    }
  }
}

Query

GET test_script/_search
{
  "runtime_mappings": {
    "is_100": {
      "type": "boolean",
      "script": {
        "source": """
        def keys_100 = 0;
        def keys = params._source['map'].keySet();
        
        for (key in keys) {
            if(params._source['map'][key] == 100){
                keys_100 = keys_100   1;
            }
        }
        
        emit(keys.size() == keys_100);
        """
      }
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "is_100": true
          }
        }
      ]
    }
  }
}

If you decide to index the runtime field you can easily do it: https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-indexed.html

  • Related