Home > OS >  Grepping records from log file using custom separator(regex) and matching keywords
Grepping records from log file using custom separator(regex) and matching keywords

Time:10-12

I've a use-case to find records from application log file that contains specific keywords. I've tried this using grep but it uses \n as a line separator and hence the logs(with \n in the messages) are partially fetched.

A sample application log file(all of them are separate lines,(in other words) with \n at the end) :

2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}
2017-11-22 03:12:23 LogManager : Currently processing data {Name: Dummy}
Fetching last name
{LastName : Value}
SomeRandomMessage
2017-11-22 03:12:23 LogManager : SomeRandomMessage
Currently processing data {Name: Dummy2}
Fetching last name
SomeRandomMessage
{LastName : Value3}
.
.
.
.

I want to use YYYY-MM-DD HH:MM:SS as a record separator and then within records, find if it contains Hello and World(for example).

Expected output :

2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}

What I've tried :

grep 'Hello' fileName
>>
2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}

CodePudding user response:

Using any POSIX awk:

$ cat tst.awk
/^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2} / {
    prt()
    rec = $0
    next
}
{ rec = rec ORS $0 }
END {
    prt()
}

function prt() {
    if ( (rec ~ /Hello/) && (rec ~ /World/) ) {
        print rec
    }
}

$ awk -f tst.awk file
2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}

CodePudding user response:

I want to use YYYY-MM-DD HH:MM:SS as a record separator

You may use this gnu-awk command:

awk -v RS='[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}' '
   /Hello/ && /World/ {printf "%s", RT $0}' file

2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}

Here -v RS='[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2}' will set record separator to date-time string and when we match Hello and World a record is printed after RT i.e. record separator text.

CodePudding user response:

UPDATE - original answer (see edit/revision) made a few assumptions based on OP's sample input; OP has since stated (per comments) that the sample input is not representative of an actual log file ...

Per comments from OP:

  • search pattern(s) won't necessarily reside in the 1st (and/or last) line of a log entry
  • a log entry may have a variable number of lines
  • cannot rely on the string LastName being on the last line of the log entry (and at this point I'm going to assume a log entry may not even contain the string LastName)

Assumptions:

  • a log entry will always start with a date of the format YYYY-MM-DD as the first field of the first line of said log entry
  • a search pattern will not span multiple lines
  • need to support up to 2 search patterns (more could be added with a redesign)

Adding some additional sample data:

$ cat fileName
2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}
last line for this entry
2017-11-22 03:12:23 LogManager : Currently processing data {Name: Dummy}
Fetching last name
{LastName : Value}
last line?
nope, this is the last line
2017-11-22 05:17:33 LogManager : Currently processing data {Name: Dummy2}
Fetching last name
{LastName : Value3}
2017-11-22 12:13:02 LogManager : Currently processing data {Name: WhoYaCalinDummy}
Fetching last name
{LastName : WhoMe}

One awk idea:

findme1='Hello'
findme2='World'

awk -v ptn1="${findme1}" -v ptn2="${findme2}" '

function test_and_print() {
    if (log_entry ~ ptn1 && log_entry ~ ptn2)                        # if ptn1/ptn2 show up anywhere in our log entry then ...
       print log_entry                                               # print it to stdout
    log_entry=""                                                     # reset our variable
}

BEGIN           { date_regex="[0-9]{4}-[0-9]{2}-[0-9]{2}" }

$1 ~ date_regex { test_and_print() }                                 # new log entry so test the previous entry
                { log_entry= log_entry (log_entry ? RS : "") $0 }    # append current line to current log entry
END             { test_and_print() }                                 # test the last entry
' fileName

For findme1='Hello'; findme2='World' this generates:

2017-11-22 01:43:36 LogManager : Currently processing data {Name: Hello}
Fetching last name
{LastName : World}
last line for this entry

For findme1='Hello'; findme2='Bob' this generates:

 # no output

For findme1='WhoMe'; findme2='' this generates:

2017-11-22 12:13:02 LogManager : Currently processing data {Name: WhoYaCalinDummy}
Fetching last name
{LastName : WhoMe}

For findme1='XXXX'; findme2='' this generates:

 # no output

CodePudding user response:

Since these are Logs which are dealing with and you mentioned they are in same format, if this is the case then try following code in GNU awk. Here is the Online Demo for used regex((^|\n)[0-9]{4}(-[0-9]{2}){2} ([0-9]{2}:){2}[0-9]{2} LogManager : [^{]*{Name: Hello}\nFetching last name\n{LastName : World) in GNU awk code.

awk -v RS="" '
{while(match($0,/(^|\n)[0-9]{4}(-[0-9]{2}){2} \
([0-9]{2}:){2}[0-9]{2} LogManager : \
[^{]*{Name: Hello}\nFetching last name\n\
{LastName : World/)){
  print substr($0,RSTART,RLENGTH)
  $0=substr($0,RSTART RLENGTH)
}
}
'   Input_file
  • Related