Home > Software engineering >  Logstash S3 input plugin - filter based on time modified
Logstash S3 input plugin - filter based on time modified

Time:12-23

I have a Logstash container that is configured to read objects from S3. The requirement is to filter old objects, let's say objects before 3 months should be dropped.

I noticed that I can expose the s3 metadata, so I have the following metadata in each event:

"@metadata" => {
    "s3" => {
                          "etag" => "\"xxx"",
                "content_length" => 33,
                      "metadata" => {},
                    "version_id" => "null",
                 "accept_ranges" => "bytes",
                 "last_modified" => 2021-12-21T13:30:28.000Z,

Maybe there is a filter/ruby code that I can use in order to filter "old" objects and drop them?

Any help is appreciated!

CodePudding user response:

You are right there is drop filter in logstash and you can use it in combination with if to drop events that match the condition like this example from documentation:

filter {
  if [loglevel] == "debug" {
    drop { }
  }
}

Besides this you will also need a way to check how old is event. For that you can use age filter. Quoting documentation:

This filter calculates the age of an event by subtracting the event timestamp from the current timestamp. You can use this plugin with the drop filter plugin to drop Logstash events that are older than some threshold.

This plugin seem to work directly with @timestamp field, so you'll have to do a little bit of shuffling around your fields.

Good luck!

CodePudding user response:

So, I after investigations and help from the Logsatsh community, I managed to handle this requirement as follows

  1. Use the mutate plugin in order to copy the last modified time to the timestamp

    mutate { copy => { "[@metadata][s3][last_modified]" => "@timestamp"} }

  2. Use the age plugin in order to filter based on the changed timestamp

    age{} if [@metadata][age] > ${number in seconds} { drop {} }

  • Related