Home > Enterprise >  Find lines from one file that do not appear (even partially) in another file
Find lines from one file that do not appear (even partially) in another file

Time:11-23

I need a Unix shell command to find lines from file1 that do not appear at all in file2. For example -

file1:

aaa 
bbb

file2:

aaaccc 
bb

Expected output:

bbb

("aaa" from file1 does appear in file2, as a part of a larger string "aaaccc").

I can't use "comm" since it only works on complete lines. In this case I'm also looking to exclude lines in file2 that contain lines in file1 as part of larger strings, as explained above.

Note I'd prefer a fast way if exists, since my files are VERY large.

CodePudding user response:

One in awk, mawk is probably the fastest so use that one:

$ awk '
NR==FNR {                # process file1
    a[$0]                # hash all records to memory
    next                 # process next record
}
{                        # process file2
    for(i in a)          # for each file1 entry in memory
        if($0 ~ i)       # see if it is found in current file2 record
            delete a[i]  # and delete if found
}
END {                    # in the end
    for(i in a)          # all left from file1
        print i          # are outputted
}' file1 file2           # mind the order

Output:

bbb
  • Related