Home > Software design >  How can I avoid "Data too large" in ELK / elasticsearch bulk inserts?
How can I avoid "Data too large" in ELK / elasticsearch bulk inserts?

Time:02-14

I'm sending data daily to my elk-stack via https://metacpan.org/pod/Search::Elasticsearch::Client::7_0::Bulk Sometimes it happens, more often recently, that I receive a "Data too large" error. The first part of my data was received, but after this error my sending script stops and I end up with incomplete data.

As far as I understood, correct me if I'm wrong, this happens when my stack is experiencing memory issues while processing the data it already received. I assume that, after some time, I could send the rest of the data, because the next day, the same issue occurs: The first bunch of my data is processed, the rest rejected with "Data too large".

I saw that I can add an "on-error" callback, but I have no clue what I can do in it. My idea would be to implement a delay and retry after some time.

Can anyone give me have a hint how to achieve it? Are there any ideas how to avoid the issue in the first place? I already increased heap space some time ago, but after 2 month the issue reoccured.

CodePudding user response:

you'd need to check your Elasticsearch logs and the full response that Elasticsearch sends back (eg was it a 429?). however heap pressure can cause this, and you'd probably need to dig into why you are experiencing that

the other option is to reduce the size of your requests that you are sending

CodePudding user response:

Update Remembering my "experience" with Java I simply did a restart of my ELK stack and the next import went through smoothly.

So despite the fact that 512m memory seem a bit low, it worked after a restart. Will check again today and then.

  1. Increase memory
  2. Schedule a nightly restart
  • Related