Home > Net >  Grep finding which of the pattern ARE NOT in the file
Grep finding which of the pattern ARE NOT in the file

Time:09-17

I'm trying to find out (grep) which of my patterns from file don't appear in log file.
I have file input.txt which contains:

00123
00124
00125
00126

and log file 20210716.log

00123
a
b
c
d
00125
00126
xy
z
...
(tons of text)
...
00127

When using grep -f input.txt 20210716.log in output i get:

00123
00125
00126

How can i output patterns from input.txt that don't appear in log file?, so I would like to get:

00124

CodePudding user response:

You may try this grep:

grep -vFf file.log input.txt

00124

Or else you can use awk like this:

awk 'NR == FNR {seen[$1]; next} !($0 in seen)' file.log input.txt

00124

CodePudding user response:

It depends on a bit what you really want. You talk about patterns, and matching patterns is tough. Example if your input file contains words that should be matched, you can use the following:

$ grep -woFf input.txt file.log | grep -vwoFf - input.txt

This reads the file input.txt as a list of patterns to search (-f), but these patterns are assumed to be fixed strings and not regular expressions (-F). We also assume that we only want to match full words (-w) and only output wha tis matched (-o). The output of this command is feed back into a pipe to grep where we do an inverse (-v) match of all found words as fixed strings (-woFf -).

The problem here is that if input.txt contains actual regular expressions, the reverse grep does not work (you can not search for foo and try to match the regex fo* which could appear in input.txt.

A more bulletproof match would be to make use of awk:

awk '(NR==FNR){a[$1];next}
     {for(r in a) a[r] =(r~a)}
     END{for(r in a) if (a[r]==0) print r}
    ' input.txt file.log

CodePudding user response:

You could also use join for this. -v1 suppresses matched output in input.txt

join requires that the data be sorted

join -v1 <(sort input.txt) <(sort 20210716.log)
  • Related