I am using ELK stack to save the nginx access logs to elasticsearch. Specifically, I am using filebeat to collect them and logstash to parse them. I am using the following configuratons:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
- /var/log/spring/geo/*.log
output.logstash:
enabled: true
hosts: ["logstash:5035"]
input {
beats {
port => 5035
}
}
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG} %{GREEDYDATA:http_x_forwarded_for}"]
}
grok {
match => [ "http_x_forwarded_for" , "%{IP:real_client_ip}"]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
useragent {
source => "message"
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "weblogs-%{ YYYY.MM.dd}"
document_type => "nginx_logs"
user => "elastic"
password => "changeme"
}
stdout { codec => rubydebug }
}
However, I have noticed that for some reason not all logs are passed to the elasticsearch. For example, let's say that I have the following logs:
172.20.0.1 - - [17/Oct/2022:08:25:22 0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "111.111.111.111"
112.111.0.1 - - [17/Oct/2022:12:43:22 0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-"
111.111.0.1 - - [17/Oct/2022:12:44:44 0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "111.111.111.111"
172.19.0.1 - - [17/Oct/2022:12:45:29 0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "78.87.79.206, 188.114.103.233"
172.18.0.1 - - [17/Oct/2022:12:46:29 0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "78.87.79.206, 188.114.103.233"
The index is created, but the log 112.111.0.1 - - [17/Oct/2022:12:43:22 0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-"
is not appearing if I query the index through dev tools. Any idea of what is causing the error?
EDIT: The query that I am using is the following:
GET weblogs-2022.10.17/_search
{
"size" : 100,
"query": {
"match_all" : {}
},
"sort" : [{"@timestamp":{"order": "desc"}}]
}
And the result includes on 4 logs instead of 5 and a part of what I am getting is the following (I cannot include all the return, since it is very big):
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
}
CodePudding user response:
It is not indexing because current grok pattern not matching with below log:
112.111.0.1 - - [17/Oct/2022:12:43:22 0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-"
Why it is not matching ?
Because it is contaning extrace space after IP address in starting. All other logs have 1 space and above log have 2 space.
You can updated your first grok filter in logstash with below configuration and it will index that log as well.
grok {
match => [ "message" , "%{COMBINEDAPACHELOG} %{GREEDYDATA:http_x_forwarded_for}", "%{IPORHOST:clientip}%{SPACE}%{HTTPDUSER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{GREEDYDATA:http_x_forwarded_for}"]
}