Home > database >  How to delete a document from Elasticsearch cluster by searching exact using python
How to delete a document from Elasticsearch cluster by searching exact using python

Time:09-17

You can insert the data with below commands, example i have put 3 documents, but I have something around more than 1000 documents

test = [{'id':1,'name': 'A', 'subject': ['Maths', 'Accounting'],
        'type':'Contract', 'Location':'NY'},
    { 'id':2,  'name': 'AB', 'subject': ['Physics', 'Engineering'],
    'type':'Permanent','Location':'NY'},
    {'id':3,   'name': 'ABC',   'subject': ['Maths', 'Engineering'],
    'type':'Permanent','Location':'NY'}]

from elasticsearch import Elasticsearch
es = Elasticsearch()
for e in test:
        es.index(index="myindex", body=e, id=e['id'])
  • I need to delete documents from if name matches AB

I tried with below commands

names = ['AB']
for name in names:
   es.delete_by_query(index="myindex", body={'name': name})
  • I got parsing exception, Unknown key for VALUE_STRING in [name]

  • While inserting into index i can ignore the name with AB, but here I need to delete from the index if name matches AB

CodePudding user response:

You need to provide an actual query to the body Argument of the delete_by_query function. something like this:

for name in names:
   es.delete_by_query(index="myindex", body={
    "query": {
        "match": {
            "name": name
        }
    }
})

# or if you cant update the doc mapping for the field: name

for name in names:
   es.delete_by_query(index="myindex", body={
    "query": {
        "term": {
            "name.keyword": name
        }
    }
})

Remeber that the term-query used here only gives you exact matches.
Here is some further reading: https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html#elasticsearch.Elasticsearch.delete_by_query of the Api-Documentation

Update on the used query:

  • term is used to query exact values, the docs. also state the term should not be used on text-fields

  • on text-fields you should use a match query, which returns docs whitin a certain score

  • to get a exact match in this case you could do a term-query on the id or update your doc-mapping, so that the name-field is a keyword

  • you could also update the query with .keyword, but this is not advised as general solution because it has a negative impact on query-perfromance (the word tokenization is done in query-time not on index-time)

Furhter reading:

  • Related