I have indexed data into ElasticSearch using the following index settings:
KNN_INDEX = {
"settings": {
"index.knn": True,
"index.knn.space_type": "cosinesimil",
"index.mapping.total_fields.limit": 10000,
"analysis": {
"analyzer": {
"default": {
"type": "standard",
"stopwords": "_english_"
}
}
}
},
"mappings": {
"dynamic_templates": [
{
"sentence_vector_template": {
"match": "sent_vec*",
"mapping": {
"type": "knn_vector",
"dimension": 384,
"store": True
}
}
},
{
"sentence_template": {
"match": "sentence*",
"mapping": {
"type": "text",
"store": True
}
}
}
],
'properties': {
"metadata": {
"type": "object"
}
}
}
}
Following are a couple of example documents that I am indexing into ElasticSearch:
{
# DOC 1
"sentence_0": "Machine learning for aquatic plastic litter detection, classification and quantification (APLASTIC-Q)Large quantities of mismanaged plastic waste are polluting and threatening the health of the blue planet."
"sentence_1": "As such, vast amounts of this plastic waste found in the oceans originates from land."
"sentence_2": "It finds its way to the open ocean through rivers, waterways and estuarine systems."
},
{
# DOC 2
"sentence_0": "What predicts persistent early conduct problems?"
"sentence_1": "Evidence from the Growing Up in Scotland cohortBackground There is a strong case for early identification of factors predicting life-course-persistent conduct disorder."
"sentence_2": "The authors aimed to identify factors associated with repeated parental reports of preschool conduct problems."
"sentence_3": "Method Nested caseecontrol study of Scottish children who had behavioural data reported by parents at 3, 4 and 5 years."
"sentence_4": "Results 79 children had abnormal conduct scores at all three time points ('persistent conduct problems') and 434 at one or two points ('inconsistent conduct problems')."
}
There can be different number of sentences for each indexed document. For querying, I want to search over all sentences over all documents. I am able to search over a particular "sentence number" in all documents using the below query:
query_body = {
"query": {
"match": {
"sentence_0": "persistent"
}
}
}
result = client.search(index=INDEX_NAME, body=query_body)
print(result)
But what I am looking for is something like below:
query_body = {
"query": {
"match": {
"sentence_*": "persistent"
}
}
}
result = client.search(index=INDEX_NAME, body=query_body)
print(result)
The above query does not work though. Is is possible perform such a query search ? Thanks.
CodePudding user response:
Use query_string it supports regex in field names
{
"query": {
"query_string": {
"fields": ["sentence*"],
"query": "persistent"
}
}
}