Exclude from file one the lines in common with file two and preserve line order-CodePudding

one

pl
a
ff
c
b
nn

two

b
a
z
k
c
d

I want to remove from the first file all lines that are also in the second (common lines). And I want to maintain the file_one lines order.

It's possible to work in a line by line check mode, something like:

while read line; do
  if ! grep $line two; then
    echo $line >> one_only
  fi
done < one

But it likely isn't the best option for a fast check. An other way would be using "comm" command on the files previously sorted:

comm -1 -2 <(sort one) <(sort two) \
  | tr '\n' '\|' \
  | sed 's/|/\\|/g;s/\\|$/\n/' \
  > common_lines

grep -v "$(cat common_lines)" one

first command creates a "logical OR pattern":

a\|b\|c

which can be reuse with grep to exclude common lines from the "one" file. In this way the original order of "one" lines is preserved. And the result will be:

pl
ff
nn

Can you suggest any other idea to minimize compute time even more?

Real input files are much more populated and contain short names (software names):

blender
gimp
vim
emacs
mozilla-firefox
google-earth

and so on...

CodePudding user response：

Using the lines of two as patterns, with grep

grep -Fx -f two -v one

CodePudding user response：

awk 'NR==FNR{a[$0]; next} !($0 in a)' two one

Regarding the while read loop in your question, please read why-is-using-a-shell-loop-to-process-text-considered-bad-practice