Check if file contains same text in consecutive lines-CodePudding

I want to check if a log file has any instance where two or more consecutive lines contains the same text using bash. The text will be specified. The timestamp and any other text after the third field are to be ignored in the comparison.

i.e grep... "error" /tmp/file.txt

this file will match:

2020-01-01 05:05 text1
2020-01-01 05:07 error
2020-01-01 05:15 error
2020-01-01 05:25 error
2020-01-01 05:45 text2

this won't

2020-01-01 05:05 text1
2020-01-01 05:15 error
2020-01-01 05:25 text2
2020-01-01 05:45 error
2020-01-01 05:05 text3

Any ideas using grep, sed or awk? Ideally I'd like to have an exit value 0 for match and 1 for not match.

CodePudding user response：

Looks like uniq does everything you need.

-d, --repeated
only print duplicate lines, one for each group

-s, --skip-chars=N
avoid comparing the first N characters

So this should work for you:

uniq --skip-chars=17 -d /tmp/file.txt

Tested on my machine:

$ cat in.txt 
2020-01-01 05:05 text1
2020-01-01 05:07 error
2020-01-01 05:15 error
2020-01-01 05:25 error
2020-01-01 05:45 text2

$ uniq --skip-chars=17 -d in.txt 
2020-01-01 05:07 error

CodePudding user response：

One in awk to test for two or more consecutive lines which to me means to exit immediately after two consecutive lines:

$ awk -v s="word" '{    # search word as a parameter
    if($3==p&&$3==s)    # if third word is the same as from previous round
        exit ec=1       # and the same as the search word, exit right away
    else 
        p=$3            # else just store the last word for next round
}
END {                   # in the end
    exit !ec            # flip the error code and exit
}' file

Test it:

$ awk -v s=error '{if($3==p&&$3==s)exit ec=1;else p=$3}END{exit !ec}' matching
$ echo $?
1
$ awk -v s=error '{if($3==p&&$3==s)exit ec=1;else p=$3}END{exit !ec}' nonmatching
$ echo $?
0

In the sample above data only third words (or space-separated fields) are considered. If looking for a string longer than a word, consider changing $3 with substr($0,n) where n==18 in your sample (the starting point of string after the datetime part):

$ awk -v s=error '{
    if(substr($0,18)==p&&substr($0,18)==s)
        exit ec=1
    else 
        p=substr($0,18)
}
END {
    exit !ec
}' file