Home > other >  Filebeat - Logstash - Multiple Config Files - Duplicate data
Filebeat - Logstash - Multiple Config Files - Duplicate data

Time:01-20

I am new to logstash and filebeat. I am trying to set up multiple config files for my logstash instance. Using filebeat to send data to logstash. Even if I have filters created for both the logstash config files, I am getting duplicate data.

Logstash config file - 1:

input {
  beats {
    port => 5045
  }
}

filter {
   if [fields][env] == "prod" {
     grok {   match => { "message" => "%{LOGLEVEL:loglevel}] %{GREEDYDATA:message}$" }
     overwrite => [ "message" ]
     }
   }
}

output {
  stdout {
    codec => rubydebug
  }

  elasticsearch {
    hosts => ["https://172.17.0.2:9200"]
    index => "logstash-myapp-%{ YYYY.MM.dd}"
    user => "elastic"
    password => "password"
    ssl => true
    cacert => "/usr/share/logstash/certs/http_ca.crt"
  }
}

logstash config file-2

input {
  beats {
    port => 5044
  }
}

filter {
   if [fields][env] == "dev" {
     grok {   match => { "message" => "%{LOGLEVEL:loglevel}] %{GREEDYDATA:message}$" }
     overwrite => [ "message" ]
     }
   }
}

output {
  stdout {
    codec => rubydebug
  }

  elasticsearch {
    hosts => ["https://172.17.0.2:9200"]
    index => "logstash-myapp-%{ YYYY.MM.dd}"
    user => "elastic"
    password => "password"
    ssl => true
    cacert => "/usr/share/logstash/certs/http_ca.crt"
  }
}

Logfile Content:

[INFO] First Line
[INFO] Second Line
[INFO] Third Line

Filebeat config:

filebeat.inputs:
- type: filestream
  enabled: true
  paths:
    - /root/data/logs/*.log
  fields:
    app: test
    env: dev

output.logstash:
  # The Logstash hosts
    hosts: ["172.17.0.4:5044"]

I know that even if we have multiple files for config, logstash processes each and every line of the data against all the filters present in all the config files. Hence we have put filters in each of the config files for "fields.env". I am expecting 3 lines to be sent to Elasticsearch because "fields.env" is "dev", but it is sending 6 lines to Elasticsearch and duplicate data. Pleas help.

CodePudding user response:

The problem is that your two configuration files get merged, not only the filters but also the outputs.

So each log line making it into the pipeline through any of the input, will go through all filters (bearing any conditions of course) and all outputs (no conditions possible in output).

So the first log line [INFO] First Line coming in from port 5044, will only go through the filter guarded by [fields][env] == "dev", but then will go through each of the two outputs, hence why it ends up twice in your ES.

So the easy solution is to remove the output section from one of the configuration file, so that log lines only go through a single output.

The better solution is to create separate pipelines.

  •  Tags:  
  • Related