Home > Software engineering >  Regex to consolidate Apache Logs on repeating lines
Regex to consolidate Apache Logs on repeating lines

Time:12-21

I am manually analyzing my apache logs, ignore the why, it doesn't matter ;)

Anyway, I am irritated by how many entries there are for videos that are streamed. Below is an example. I would love to have a regular expression that matches repeating lines, ignoring the minor variations in the timestamp and the bytes transferred, but pays attention to shifting IP addresses, to not remove more than just the constant repeating lines.

Example of starting:

172.59.152.20 - - [19/Dec/2022:04:52:54  0000] "GET /video.mp4 HTTP/1.1" 206 504267 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:52:55  0000] "GET /video.mp4 HTTP/1.1" 206 180747 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:52:56  0000] "GET /video.mp4 HTTP/1.1" 206 40261 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:52:56  0000] "GET /video.mp4 HTTP/1.1" 206 1427820 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:52:57  0000] "GET /video.mp4 HTTP/1.1" 206 47938302 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:10  0000] "GET /video.mp4 HTTP/1.1" 206 9304011 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:17  0000] "GET /video.mp4 HTTP/1.1" 206 11115723 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:23  0000] "GET /video.mp4 HTTP/1.1" 206 10468683 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:29  0000] "GET /video.mp4 HTTP/1.1" 206 4386507 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:36  0000] "GET /video.mp4 HTTP/1.1" 206 5292363 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:42  0000] "GET /video.mp4 HTTP/1.1" 206 6780555 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:49  0000] "GET /video2.mp4 HTTP/1.1" 206 3739467 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:51  0000] "GET /video2.mp4 HTTP/1.1" 206 202874 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:52  0000] "GET /video2.mp4 HTTP/1.1" 206 9592368 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:02  0000] "GET /video2.mp4 HTTP/1.1" 206 7233483 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:08  0000] "GET /video2.mp4 HTTP/1.1" 206 7427595 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:15  0000] "GET /video.mp4 HTTP/1.1" 206 10867691 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:21  0000] "GET /video.mp4 HTTP/1.1" 206 6845259 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:28  0000] "GET /video.mp4 HTTP/1.1" 206 11568651 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:34  0000] "GET /video.mp4 HTTP/1.1" 206 10856907 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:41  0000] "GET /video.mp4 HTTP/1.1" 206 8139339 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:49  0000] "GET /video.mp4 HTTP/1.1" 206 10792203 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:56  0000] "GET /video.mp4 HTTP/1.1" 206 10220651 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:55:02  0000] "GET /video.mp4 HTTP/1.1" 206 10468683 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:55:09  0000] "GET /video.mp4 HTTP/1.1" 206 9109899 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"

Desired Output:

172.59.152.20 - - [19/Dec/2022:04:53:42  0000] "GET /video.mp4 HTTP/1.1" 206 6780555 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:53:49  0000] "GET /video2.mp4 HTTP/1.1" 206 3739467 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"
172.59.152.20 - - [19/Dec/2022:04:54:15  0000] "GET /video.mp4 HTTP/1.1" 206 10867691 "Mozilla/5.0 (iPhone; CPU iPhone OS 15_6_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6.1 Mobile/15E148 Safari/604.1"

It really doesn't matter in reality if it is keeping the first or the last of the matching lines.

I did look into uniq, but it didn't seem to be able to keep track of the ip, skip timestamp, match on content, and then ignore the rest of the line.

For example, here is a regex for each line:

Group 1, IP, 2 - Timestamp, 3 - Content 4- HTTP Response Code, 5- Bytes, 6 - Browser / OS
(\d .\d .\d .\d ) - - (.*) \ 0000(.*)HTTP\/1.1\" (\d{3}) (\d{1,8})(.*)

If I wanted to ignore group 2 and 5, but keep the rest, how would I?

CodePudding user response:

This is a great application of back references. Capture the whole first line in group 1. Then capture the IP address in group 2 and the filename in group 3. Use those captures to skip over every next line with that IP and filename. Here's the regex_replace that will retain the first line of each group;

^(((?:\d{1,3}\.){3}\d{1,3})[^"] ("[^"]*?").*?(?:\n|$))(\2[^"] \3.*?(\n|$))*

You need to assert the g (global replace) and m (multi-line processing) flags. Here is your example in Regex101 in replace mode; enter image description here

  • Related