Home > Back-end >  Ignore indexing existing document while reindexing in ElasticSearch
Ignore indexing existing document while reindexing in ElasticSearch

Time:02-17

I'm trying to move data between two ElasticSearch instances.Is there a way to skip the documents that are already existing in target index ?

from opensearchpy import OpenSearch,RequestsHttpConnection, helpers
def reindex_data_to_data_curation_es(es_src, es_des):
    try:
        helpers.reindex(es_src, src_idx, tar_idx, target_client=es_des, query={'query': {'match_all': {}}})
    except Exception as e:
        print("timed out", str(e))

CodePudding user response:

You cannot skip them from the source index, but you can not make sure to not override them in the target index if they already exist. Simply add the op_type: create setting in order to not override existing documents in the target index:

helpers.reindex(es_src, src_idx, tar_idx, target_client=es_des, op_type='create', query={'query': {'match_all': {}}})
                                                                       ^
                                                                       |
                                                                    add this
  • Related