Home > Software design >  in bash, how can I find a pattern in one file that doesn't match any line of another file?
in bash, how can I find a pattern in one file that doesn't match any line of another file?

Time:05-29

How can I find a pattern in one file that doesn't match any line of another file

I'm aware that grep has a -f option, so instead of feeding grep a pattern, I can feed it a file of patterns.

(a.a is my main file)

user@system:~/test# cat a.a
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4

(p.p is my file of patterns)

user@system:~/test# cat p.p
ZBn1gozZoEM
0maL4cQ8zuU
vP-2ahd8pHY

So the command might be something like

somekindofgrep p.p a.a

but it should give 0maL4cQ8zuU which is the pattern in the file of patterns, p.p, that doesn't match anything in the file a.a

I am not sure what command to do.

$grep -f p.p a.a<ENTER>
Were Alexander-ZBn1gozZoEM.mp4
Will Ate-vP-2ahd8pHY.mp4
$

I know that if there was an additional line in a.a not matched by any pattern in p.p, then grep -f p.p a.a won't show it. And if I do grep -v -f p.p a.a then it'd only show that line of a.a, not matched in p.p

But i'm interested in finding what pattern in (my file of patterns) p.p doesn't match a.a!

I looked at Make grep print missing queries but he wants everything from both files. And also, one of the answers there mentions -v but I can't quite see that applying to my case because -v shows the lines of a file that don't match any pattern. So having or not having -v won't help me there, because i'm looking for a pattern that doesn't match any line of a file.

CodePudding user response:

Home made script:

#!/bin/bash

if [[ $# -eq 2 ]]
then
    patterns="$1"
    mainfile="$2"

    if [[ ! -f "$patterns" ]]
    then
        echo "ERROR: file $patterns does not exist."
        exit 1
    fi
    if [[ ! -f "$mainfile" ]]
    then
        echo "ERROR: file $mainfile does not exist."
        exit 1
    fi
else
    echo "Usage: $0 <PATTERNS FILE> <MAIN FILE>"
    exit 1
fi

while IFS= read -r pattern
do
    if [[ $(grep -c "$pattern" "$mainfile") -eq 0 ]]
    then
        echo "$pattern"
    fi
done < "$patterns"

Like user1934428 suggested, this script loops on the patterns in file p.p and prints out any pattern that is not found in file a.a.

CodePudding user response:

Suggesting awk script that scans a.a once:

script.awk

FNR==NR{wordsArr[$0] = 1; next} # read patterns list from 1st file into array wordsArr
{ # for each line in 2nd file
  for (i in wordsArr){ # iterate over all patterns in array
    if ($0 ~ i) delete wordsArr[i]; # if pattern is matched to current line remove the pattern from array
  }
}
END {for (i in wordsArr) print "Unmatched: " i} # print all patterns left in wordsArray

running: script.awk

awk -f script.awk p.p a.a

Testing:

p.p

aa
bb
cc
dd
ee

a.a

ddd
eee
ggg
fff
aaa

test:

awk -f script.awk p.p a.a
Unmatched: bb
Unmatched: cc

CodePudding user response:

# grep p.p pattern in a.a and output pattern 
# if grep is true (pattern matched in a.a)
xargs -i sh -c 'grep -q "{}" a.a && echo "{}"' < p.p
# if grep is false (pattern NOT matched in a.a <--- what you need)
xargs -i sh -c 'grep -q "{}" a.a || echo "{}"' < p.p
  • Related