How can I find a pattern in one file that doesn't match any line of another file
I'm aware that grep has a -f option, so instead of feeding grep a pattern, I can feed it a file of patterns.
(a.a is my main file)
user@system:~/test# cat a.a
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4
(p.p is my file of patterns)
user@system:~/test# cat p.p
ZBn1gozZoEM
0maL4cQ8zuU
vP-2ahd8pHY
So the command might be something like
somekindofgrep p.p a.a
but it should give 0maL4cQ8zuU
which is the pattern in the file of patterns, p.p, that doesn't match anything in the file a.a
I am not sure what command to do.
$grep -f p.p a.a<ENTER>
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4
$
I know that if there was an additional line in a.a not matched by any pattern in p.p, then grep -f p.p a.a
won't show it. And if I do grep -v -f p.p a.a
then it'd only show that line of a.a, not matched in p.p
But i'm interested in finding what pattern in (my file of patterns) p.p doesn't match a.a!
I looked at Make grep print missing queries but he wants everything from both files. And also, one of the answers there mentions -v but I can't quite see that applying to my case because -v shows the lines of a file that don't match any pattern. So having or not having -v won't help me there, because i'm looking for a pattern that doesn't match any line of a file.
CodePudding user response:
Home made script:
#!/bin/bash
if [[ $# -eq 2 ]]
then
patterns="$1"
mainfile="$2"
if [[ ! -f "$patterns" ]]
then
echo "ERROR: file $patterns does not exist."
exit 1
fi
if [[ ! -f "$mainfile" ]]
then
echo "ERROR: file $mainfile does not exist."
exit 1
fi
else
echo "Usage: $0 <PATTERNS FILE> <MAIN FILE>"
exit 1
fi
while IFS= read -r pattern
do
if [[ $(grep -c "$pattern" "$mainfile") -eq 0 ]]
then
echo "$pattern"
fi
done < "$patterns"
Like user1934428 suggested, this script loops on the patterns in file p.p
and prints out any pattern that is not found in file a.a
.
CodePudding user response:
Suggesting awk
script that scans a.a
once:
script.awk
FNR==NR{wordsArr[$0] = 1; next} # read patterns list from 1st file into array wordsArr
{ # for each line in 2nd file
for (i in wordsArr){ # iterate over all patterns in array
if ($0 ~ i) delete wordsArr[i]; # if pattern is matched to current line remove the pattern from array
}
}
END {for (i in wordsArr) print "Unmatched: " i} # print all patterns left in wordsArray
running: script.awk
awk -f script.awk p.p a.a
Testing:
p.p
aa
bb
cc
dd
ee
a.a
ddd
eee
ggg
fff
aaa
test:
awk -f script.awk p.p a.a
Unmatched: bb
Unmatched: cc
CodePudding user response:
# grep p.p pattern in a.a and output pattern
# if grep is true (pattern matched in a.a)
xargs -i sh -c 'grep -q "{}" a.a && echo "{}"' < p.p
# if grep is false (pattern NOT matched in a.a <--- what you need)
xargs -i sh -c 'grep -q "{}" a.a || echo "{}"' < p.p