When I want to merge log files, I often use cat logA.log logB.log | sort
. As long as the log lines start with some timestamp-like string in a common format, that's fine.
But can I somehow sort the lines and keep lines that do(n't) follow a certain rule glued to their original leading line? Just think of a log file where somebody logged something with linebreaks in it (without me knowing that)!
(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
These two log files of course would become unusable if merged with cat berta.log caesar.log | sort
.
I also am really unsure if I should post this question to StackOverflow or to Superuser or even to Unix or ServerFault...
Edit for clarity
The merged logs should look e.g. like this:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
CodePudding user response:
Classic problem of mixing lines and files.
A solution: Put your multiline log lines on one line
- Executable script:
./onelinelog.awk
#! /usr/bin/awk -f
# Timestamp line
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
if (log_line != "") { print log_line }
log_line = $0
next
}
# Other line
{
# Here, I use '§' for separate each original lines
log_line = log_line "§" $0
}
# End of file
END {
if (log_line != "") { print log_line }
}
Test on caesar.log
file:
$ ./onelinelog.awk caesar.log
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
- Sort:
cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort
or
sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)
Output:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
Fun ?
You may want to recover your original lines...
Use sed
:
$ cat and/or sort ... | sed -e 's/§/\n/g'
or another executable awk script: ./tomultilinelog.awk
#! /usr/bin/awk -f
BEGIN {
FS="§"
}
{
for (i = 1; i <= NF; i = 1) { print $i }
}
So execute:
$ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
Of course, you could adapt the code and replace '§
' character with another token.
CodePudding user response:
I've come up with another awk solution while Arnaud Valmary posted his one.
In my attempt, I just prefixed all lines that do not start with a timestamp with the last timestamp (and a number):
prefixAllLines.awk
#! /usr/bin/awk -f
BEGIN {
linePattern="^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (.*)"
}
{
if ($0~linePattern){
number=0
linePrefix=gensub(linePattern, "\\1", "g", $0)
lineRest=gensub(linePattern, "\\2", "g", $0)
printf linePrefix " "
printf ("d", number)
printf " " lineRest "\n"
} else {
number =1
printf linePrefix " "
printf ("d", number)
printf " " $0 "\n"
}
}
So, ./prefixAllLines.awk caesar.log
brings:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
And cat <(./prefixAllLines.awk caesar.log) <(./prefixAllLines.awk berta.log) | sort
:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:10 000 Hey!
2021-10-01 00:00:11 000 How are you doing, Adam?
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
But I like Arnaud Valmary's approach much more. :-)