Home > Enterprise >  elasticsearch _bulk index - concatenated and hashed id
elasticsearch _bulk index - concatenated and hashed id

Time:08-25

I'm not sure if this is possible inside _bulk and I don't know the exact syntax to be used here but I'd like to create _id that is combination of few fields from document which is at the end hashed.

so something like this (note: see the _id attribute):

POST /_bulk
{"index":{"_index":"eocs-technical-2022.08.24","_type":"_doc", "_id": "${hash(doc['@timestamp']   doc['message']   doc['instance_id'])}"}}
{"@timestamp":"2022-08-24T13:49:34.428 0200","message":"This is testing message","hostname":"testcomputer.local","ip":"-","service_name":"test-service","instance_id":"c0","build.version":"master-d723731300570fd1b2d241c4849b223673d1c8d8","source":"com.example.ELKTest","level":"DEBUG","thread_name":"scheduler-1"}

Is that possible?

Thanks

CodePudding user response:

It's possible to do it using an ingest pipeline with a fingerprint processor, like this:

PUT _ingest/pipeline/id-hasher
{
  "processors": [
    {
      "fingerprint": {
        "target_field": "_id",
        "fields": [
          "@timestamp",
          "message",
          "instance_id"
        ]
      }
    }
  ]
}

And then you can simply reference that pipeline in your bulk call

POST /_bulk?pipeline=id-hasher
{"index":{"_index":"eocs-technical-2022.08.24","_type":"_doc", "_id": "dummy"}}
{"@timestamp":"2022-08-24T13:49:34.428 0200","message":"This is testing message","hostname":"testcomputer.local","ip":"-","service_name":"test-service","instance_id":"c0","build.version":"master-d723731300570fd1b2d241c4849b223673d1c8d8","source":"com.example.ELKTest","level":"DEBUG","thread_name":"scheduler-1"}

The generated id for the sample message above will be A3t5JHZE4ejqYoxEkfrnyTKBfFY

CodePudding user response:

Tldr;

This is not possible using the bulk API.

This should be done by the app uploading the data.

  • Related