What are you trying to do?
Using Filebeat to take input data as filestream
from JSON files in ndjson format
and inserting them into my_index
in Elasticsearch with no additional keys.
Show me your configs.
Elasticsearch.yml
# ---------------------------------- Cluster -----------------------------------
#
cluster.name: masterCluster
#
# ------------------------------------ Node ------------------------------------
#
node.name: masterNode
#
#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------
# Security features
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false
#----------------------- END SECURITY AUTO CONFIGURATION -------------------------
Filebeat.yml
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /home/asura/EBK/data/*.json
parser:
- ndjson:
keys_under_root: true
add_error_key: true
# ======================= Elasticsearch template setting =======================
setup.ilm.enabled: false
setup.template:
name: "my_index_template"
pattern: "my_index*"
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
hosts: ["localhost:9200"]
index: "my_index"
What do
my_index
andmy_index_template
look like?
Mappings of my_index in Kibana :
{
"mappings": {}
}
Preview of my_index_template in Kibana :
{
"template": {
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
}
}
},
"aliases": {},
"mappings": {}
}
}
What does your input file look like?
input.json
{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":32, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Blue"}
{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":36, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Grey,Blue"}
I drag and drop the above file in the watched folder and the insertion just works.
What does the data look like after inserting into Elasticsearch?
GET Request : http://<host>:<my_port>/my_index/_search?filter_path=hits.hits._source
Response :
{
"hits": {
"hits": [
{
"_source": {
"@timestamp": "2022-04-21T21:49:04.084Z",
"log": {
"offset": 0,
"file": {
"path": "/home/asura/EBK/data/input.json"
}
},
"frame": 131,
"Class": "person",
"input": {
"type": "filestream"
},
"ecs": {
"version": "8.0.0"
},
"host": {
"name": "pisacha"
},
"agent": {
"ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
"id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
"name": "pisacha",
"type": "filebeat",
"version": "8.1.3"
},
"Date & Time": "Thu Oct 3 14:02:41 2019",
"Others": "Blue",
"filename": "16.avi",
"confidence": 32
}
},
{
"_source": {
"@timestamp": "2022-04-21T21:49:04.084Z",
"agent": {
"type": "filebeat",
"version": "8.1.3",
"ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
"id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
"name": "pisacha"
},
"Others": "Grey,Blue",
"filename": "16.avi",
"input": {
"type": "filestream"
},
"frame": 131,
"Class": "person",
"ecs": {
"version": "8.0.0"
},
"host": {
"name": "pisacha"
},
"confidence": 36,
"log": {
"offset": 133,
"file": {
"path": "/home/asura/EBK/data/input.json"
}
},
"Date & Time": "Thu Oct 3 14:02:41 2019"
}
},
{
"_source": {
"@timestamp": "2022-04-21T21:49:04.084Z",
"input": {
"type": "filestream"
},
"agent": {
"id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
"name": "pisacha",
"type": "filebeat",
"version": "8.1.3",
"ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab"
},
"ecs": {
"version": "8.0.0"
},
"host": {
"name": "pisacha"
},
"message": "",
"error": {
"type": "json",
"message": "Error decoding JSON: EOF"
}
}
}
]
}
}
It didn't use the template that I specified.
And surprisingly:
Preview of my_index
in Kibana after Filebeat has inserted the data :
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"Class": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Date & Time": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"Others": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"agent": {
"properties": {
"ephemeral_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"confidence": {
"type": "long"
},
"ecs": {
"properties": {
"version": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"error": {
"properties": {
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"filename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"frame": {
"type": "long"
},
"host": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"input": {
"properties": {
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"log": {
"properties": {
"file": {
"properties": {
"path": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"offset": {
"type": "long"
}
}
},
"message": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
The mapping in my_index_template
is HUGE, tens of thousands of lines long. Almost as if it has all the fields that fields.yml
has.
Also it made a data_stream
named my_index
for it by default.
Even after setting setup.ilm.enabled: false
the data is still getting inserted with all the fields shown in filebeat default index template. I have searched and tried everything I could, I need some guidance here from someone who isn't shooting in the dark.
Version used for Elasticsearch, Kibana and Filebeat : 8.1.3
Please do comment if you need more info :)
References:
- Parsing ndjson: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_parsers
- For using custom index: https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#index-option-es
- For using custom templates: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-template.html
- For filtered response: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering
CodePudding user response:
TLDR;
I am not sure there is an option to stop Filebeat
to add the those fields.
But you could add a filter processor in your output to remove them.
# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /home/asura/EBK/data/*.json
parser:
- ndjson:
keys_under_root: true
add_error_key: true
# ======================= Elasticsearch template setting =======================
setup.ilm.enabled: false
setup.template:
name: "my_index_template"
pattern: "my_index*"
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
hosts: ["localhost:9200"]
index: "my_index"
processors:
- drop_fields:
fields: ["agent", "ecs", "host", ...]
If the option to just disable entirely Beats
to add some fields in the first place exist it would be a better option. I am just not aware of it.
EDITS:
The complete working solution involves Globally Declared Processors
.
filebeat.inputs:
- type: filestream
# Input Processors act during input stage of processing pipeline
processors:
- drop_fields:
fields: ["key1","key2"]
# ---------------------------- Global Processors ------------------
# Global processors for fields that are added later by filebeat
processors:
- drop_fields:
fields: ["agent", "ecs", "input", "log", "host"]
Reference:
https://discuss.elastic.co/t/filebeat-didnt-drop-some-of-the-fields-like-agent-ecs-etc/243911/2