I am trying to index a large number of documents in Python to Elasticsearch, after reading the documentation, they refer to this example. (edit: I am using this exact code, only changed the index name into a datastream.)
This example works great when I am indexing into a normal index, however, when I try to index into a datastream, even into a brand new datastream, that can accept dynamic content, I get this error:
Traceback (most recent call last):
File "/Users/Downloads/elasticsearch-py-main/examples/bulk-ingest/bulk-ingest.py", line 111, in <module>
main()
File "/Users/Downloads/elasticsearch-py-main/examples/bulk-ingest/bulk-ingest.py", line 102, in main
for ok, action in bulk(
File "/opt/homebrew/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 524, in bulk
for ok, item in streaming_bulk(
File "/opt/homebrew/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 438, in streaming_bulk
for data, (ok, info) in zip(
File "/opt/homebrew/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 355, in _process_bulk_chunk
yield from gen
File "/opt/homebrew/lib/python3.9/site-packages/elasticsearch/helpers/actions.py", line 274, in _process_bulk_chunk_success
raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
elasticsearch.helpers.BulkIndexError: 2 document(s) failed to index.
I cannot find any information on this, how can I index my data in bulk using the Elasticsearch Python connector?
CodePudding user response:
This is probably because when sending documents to a data stream you need to set the action to create
instead of index
{ "create": {"_id": "123"}}
{ "field": "value" }
With the Python bulk helpers, you need to explicitly set '_op_type': 'create'
in your bulk actions.