Home > OS >  Exclude from file one the lines in common with file two and preserve line order
Exclude from file one the lines in common with file two and preserve line order

Time:02-11

one

pl
a
ff
c
b
nn

two

b
a
z
k
c
d

I want to remove from the first file all lines that are also in the second (common lines). And I want to maintain the file_one lines order.

It's possible to work in a line by line check mode, something like:

while read line; do
  if ! grep $line two; then
    echo $line >> one_only
  fi
done < one

But it likely isn't the best option for a fast check. An other way would be using "comm" command on the files previously sorted:

comm -1 -2 <(sort one) <(sort two) \
  | tr '\n' '\|' \
  | sed 's/|/\\|/g;s/\\|$/\n/' \
  > common_lines

grep -v "$(cat common_lines)" one

first command creates a "logical OR pattern":

a\|b\|c

which can be reuse with grep to exclude common lines from the "one" file. In this way the original order of "one" lines is preserved. And the result will be:

pl
ff
nn

Can you suggest any other idea to minimize compute time even more?

Real input files are much more populated and contain short names (software names):

blender
gimp
vim
emacs
mozilla-firefox
google-earth

and so on...

CodePudding user response:

Using the lines of two as patterns, with

grep -Fx -f two -v one

CodePudding user response:

awk 'NR==FNR{a[$0]; next} !($0 in a)' two one

Regarding the while read loop in your question, please read why-is-using-a-shell-loop-to-process-text-considered-bad-practice

  • Related