I need a Unix shell command to find lines from file1 that do not appear at all in file2. For example -
file1:
aaa
bbb
file2:
aaaccc
bb
Expected output:
bbb
("aaa" from file1 does appear in file2, as a part of a larger string "aaaccc").
I can't use "comm" since it only works on complete lines. In this case I'm also looking to exclude lines in file2 that contain lines in file1 as part of larger strings, as explained above.
Note I'd prefer a fast way if exists, since my files are VERY large.
CodePudding user response:
One in awk, mawk is probably the fastest so use that one:
$ awk '
NR==FNR { # process file1
a[$0] # hash all records to memory
next # process next record
}
{ # process file2
for(i in a) # for each file1 entry in memory
if($0 ~ i) # see if it is found in current file2 record
delete a[i] # and delete if found
}
END { # in the end
for(i in a) # all left from file1
print i # are outputted
}' file1 file2 # mind the order
Output:
bbb