I'm trying to find out (grep) which of my patterns from file don't appear in log file.
I have file input.txt
which contains:
00123
00124
00125
00126
and log file 20210716.log
00123
a
b
c
d
00125
00126
xy
z
...
(tons of text)
...
00127
When using grep -f input.txt 20210716.log
in output i get:
00123
00125
00126
How can i output patterns from input.txt
that don't appear in log file?, so I would like to get:
00124
CodePudding user response:
You may try this grep
:
grep -vFf file.log input.txt
00124
Or else you can use awk like this:
awk 'NR == FNR {seen[$1]; next} !($0 in seen)' file.log input.txt
00124
CodePudding user response:
It depends on a bit what you really want. You talk about patterns, and matching patterns is tough. Example if your input file contains words that should be matched, you can use the following:
$ grep -woFf input.txt file.log | grep -vwoFf - input.txt
This reads the file input.txt
as a list of patterns to search (-f
), but these patterns are assumed to be fixed strings and not regular expressions (-F
). We also assume that we only want to match full words (-w
) and only output wha tis matched (-o
). The output of this command is feed back into a pipe to grep
where we do an inverse (-v
) match of all found words as fixed strings (-woFf -
).
The problem here is that if input.txt
contains actual regular expressions, the reverse grep
does not work (you can not search for foo
and try to match the regex fo*
which could appear in input.txt
.
A more bulletproof match would be to make use of awk
:
awk '(NR==FNR){a[$1];next}
{for(r in a) a[r] =(r~a)}
END{for(r in a) if (a[r]==0) print r}
' input.txt file.log
CodePudding user response:
You could also use join
for this. -v1
suppresses matched output in input.txt
join
requires that the data be sorted
join -v1 <(sort input.txt) <(sort 20210716.log)