Home > Blockchain >  ElasticSearch: Possible to Query with Regex Field?
ElasticSearch: Possible to Query with Regex Field?

Time:10-27

I have indexed data into ElasticSearch using the following index settings:

KNN_INDEX = {
    "settings": {
        "index.knn": True,
        "index.knn.space_type": "cosinesimil",
        "index.mapping.total_fields.limit": 10000,
        "analysis": {
          "analyzer": {
            "default": {
              "type": "standard",
              "stopwords": "_english_"
            }
          }
        }
    },
    "mappings": {
        "dynamic_templates": [
            {
                "sentence_vector_template": {
                    "match": "sent_vec*",
                    "mapping": {
                        "type": "knn_vector",
                        "dimension": 384,
                        "store": True
                    }
                }
            },
            {
                "sentence_template": {
                    "match": "sentence*",
                    "mapping": {
                        "type": "text",
                        "store": True
                    }
                }
            }
        ],
        'properties': {
            "metadata": {
                "type": "object"
            }
        }
    }
}

Following are a couple of example documents that I am indexing into ElasticSearch:

{
    # DOC 1
    "sentence_0": "Machine learning for aquatic plastic litter detection, classification and quantification (APLASTIC-Q)Large quantities of mismanaged plastic waste are polluting and threatening the health of the blue planet."
    "sentence_1": "As such, vast amounts of this plastic waste found in the oceans originates from land."
    "sentence_2": "It finds its way to the open ocean through rivers, waterways and estuarine systems."
},
{
    # DOC 2
    "sentence_0": "What predicts persistent early conduct problems?"
    "sentence_1": "Evidence from the Growing Up in Scotland cohortBackground There is a strong case for early identification of factors predicting life-course-persistent conduct disorder."
    "sentence_2": "The authors aimed to identify factors associated with repeated parental reports of preschool conduct problems."
    "sentence_3": "Method Nested caseecontrol study of Scottish children who had behavioural data reported by parents at 3, 4 and 5 years."
    "sentence_4": "Results 79 children had abnormal conduct scores at all three time points ('persistent conduct problems') and 434 at one or two points ('inconsistent conduct problems')."
}

There can be different number of sentences for each indexed document. For querying, I want to search over all sentences over all documents. I am able to search over a particular "sentence number" in all documents using the below query:

query_body = {
        "query": {
            "match": {
                "sentence_0": "persistent"
            }
        }
    }
    result = client.search(index=INDEX_NAME, body=query_body)
    print(result)

But what I am looking for is something like below:

query_body = {
        "query": {
            "match": {
                "sentence_*": "persistent"
            }
        }
    }
result = client.search(index=INDEX_NAME, body=query_body)
print(result)

The above query does not work though. Is is possible perform such a query search ? Thanks.

CodePudding user response:

Use query_string it supports regex in field names

{
  "query": {
   "query_string": {
     "fields": ["sentence*"],
     "query": "persistent"
   }
  }
}
  • Related