Can I sort with context in bash?-CodePudding

When I want to merge log files, I often use cat logA.log logB.log | sort. As long as the log lines start with some timestamp-like string in a common format, that's fine.

But can I somehow sort the lines and keep lines that do(n't) follow a certain rule glued to their original leading line? Just think of a log file where somebody logged something with linebreaks in it (without me knowing that)!

(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?

(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

These two log files of course would become unusable if merged with cat berta.log caesar.log | sort.

^{I also am really unsure if I should post this question to StackOverflow or to Superuser or even to Unix or ServerFault...}

Edit for clarity

The merged logs should look e.g. like this:

2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

CodePudding user response：

Classic problem of mixing lines and files.

A solution: Put your multiline log lines on one line

Executable script: ./onelinelog.awk

#! /usr/bin/awk -f

# Timestamp line
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
    if (log_line != "") { print log_line }
    log_line = $0
    next
}
# Other line
{
    # Here, I use '§' for separate each original lines
    log_line = log_line "§" $0
}
# End of file
END {
    if (log_line != "") { print log_line }
}

Test on caesar.log file:

$ ./onelinelog.awk caesar.log 
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

Sort:

cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort

sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)

Output:

2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

Fun ?

You may want to recover your original lines...

Use sed:

$ cat and/or sort ... | sed -e 's/§/\n/g'

or another executable awk script: ./tomultilinelog.awk

#! /usr/bin/awk -f
BEGIN {
    FS="§"
}
{
    for (i = 1; i <= NF; i  = 1) { print $i }
}

So execute:

$ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk 
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

Of course, you could adapt the code and replace '§' character with another token.

CodePudding user response：

I've come up with another awk solution while Arnaud Valmary posted his one.

In my attempt, I just prefixed all lines that do not start with a timestamp with the last timestamp (and a number):

prefixAllLines.awk

#! /usr/bin/awk -f

BEGIN { 
    linePattern="^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (.*)" 
}
{ 
    if ($0~linePattern){
        number=0
        linePrefix=gensub(linePattern, "\\1", "g", $0)
        lineRest=gensub(linePattern, "\\2", "g", $0)
        printf linePrefix " " 
        printf ("d", number)
        printf " " lineRest "\n"
    } else {
        number =1
        printf linePrefix " " 
        printf ("d", number)
        printf " " $0 "\n"
    }
}

So, ./prefixAllLines.awk caesar.log brings:

2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001         at Conversation.parseStatement
2021-10-01 00:00:20 002         at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003         at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!

And cat <(./prefixAllLines.awk caesar.log) <(./prefixAllLines.awk berta.log) | sort:

2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:10 000 Hey!
2021-10-01 00:00:11 000 How are you doing, Adam?
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001         at Conversation.parseStatement
2021-10-01 00:00:20 002         at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003         at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!

But I like Arnaud Valmary's approach much more. :-)