Home > Net >  Elasticsearch Filebeat ignores custom index template and overwrites output index's mapping with
Elasticsearch Filebeat ignores custom index template and overwrites output index's mapping with

Time:04-25

What are you trying to do?

Using Filebeat to take input data as filestream from JSON files in ndjson format and inserting them into my_index in Elasticsearch with no additional keys.


Show me your configs.

Elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
cluster.name: masterCluster
#
# ------------------------------------ Node ------------------------------------
#
node.name: masterNode
#
#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------

# Security features
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

Filebeat.yml

# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream

  enabled: true

  paths:
    - /home/asura/EBK/data/*.json

  parser:
    - ndjson:
        keys_under_root: true
        add_error_key: true

# ======================= Elasticsearch template setting =======================

setup.ilm.enabled: false

setup.template:
  name: "my_index_template"
  pattern: "my_index*"

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:

  hosts: ["localhost:9200"]
  index: "my_index"


What do my_index and my_index_template look like?

Mappings of my_index in Kibana :

{
  "mappings": {}
}

Preview of my_index_template in Kibana :

{
  "template": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        }
      }
    },
    "aliases": {},
    "mappings": {}
  }
}

What does your input file look like?

input.json

{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":32, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Blue"}
{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":36, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Grey,Blue"}

I drag and drop the above file in the watched folder and the insertion just works.


What does the data look like after inserting into Elasticsearch?

GET Request : http://<host>:<my_port>/my_index/_search?filter_path=hits.hits._source

Response :

{
  "hits": {
    "hits": [
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "log": {
            "offset": 0,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "frame": 131,
          "Class": "person",
          "input": {
            "type": "filestream"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "agent": {
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3"
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019",
          "Others": "Blue",
          "filename": "16.avi",
          "confidence": 32
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "agent": {
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha"
          },
          "Others": "Grey,Blue",
          "filename": "16.avi",
          "input": {
            "type": "filestream"
          },
          "frame": 131,
          "Class": "person",
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "confidence": 36,
          "log": {
            "offset": 133,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019"
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "message": "",
          "error": {
            "type": "json",
            "message": "Error decoding JSON: EOF"
          }
        }
      }
    ]
  }
}

It didn't use the template that I specified.


And surprisingly:

Preview of my_index in Kibana after Filebeat has inserted the data :

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "Class": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Date & Time": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Others": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "agent": {
        "properties": {
          "ephemeral_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "confidence": {
        "type": "long"
      },
      "ecs": {
        "properties": {
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "error": {
        "properties": {
          "message": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "filename": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "frame": {
        "type": "long"
      },
      "host": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "input": {
        "properties": {
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "log": {
        "properties": {
          "file": {
            "properties": {
              "path": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "offset": {
            "type": "long"
          }
        }
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

The mapping in my_index_template is HUGE, tens of thousands of lines long. Almost as if it has all the fields that fields.yml has. Also it made a data_stream named my_index for it by default.

Even after setting setup.ilm.enabled: false the data is still getting inserted with all the fields shown in filebeat default index template. I have searched and tried everything I could, I need some guidance here from someone who isn't shooting in the dark.

Version used for Elasticsearch, Kibana and Filebeat : 8.1.3 Please do comment if you need more info :)

References:

  1. Parsing ndjson: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_parsers
  2. For using custom index: https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#index-option-es
  3. For using custom templates: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-template.html
  4. For filtered response: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering

CodePudding user response:

TLDR;

I am not sure there is an option to stop Filebeat to add the those fields.

But you could add a filter processor in your output to remove them.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream

  enabled: true

  paths:
    - /home/asura/EBK/data/*.json

  parser:
    - ndjson:
        keys_under_root: true
        add_error_key: true

# ======================= Elasticsearch template setting =======================

setup.ilm.enabled: false

setup.template:
  name: "my_index_template"
  pattern: "my_index*"

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:

  hosts: ["localhost:9200"]
  index: "my_index"
  processors:
  - drop_fields:
      fields: ["agent", "ecs", "host", ...]

If the option to just disable entirely Beats to add some fields in the first place exist it would be a better option. I am just not aware of it.


EDITS:

The complete working solution involves Globally Declared Processors.

filebeat.inputs:
- type: filestream

  # Input Processors act during input stage of processing pipeline
  processors:
  - drop_fields:
      fields: ["key1","key2"]

# ---------------------------- Global Processors ------------------
# Global processors for fields that are added later by filebeat
processors:
- drop_fields:
    fields: ["agent", "ecs", "input", "log", "host"]

Reference:

https://discuss.elastic.co/t/filebeat-didnt-drop-some-of-the-fields-like-agent-ecs-etc/243911/2

  • Related