one
pl
a
ff
c
b
nn
two
b
a
z
k
c
d
I want to remove from the first file all lines that are also in the second (common lines). And I want to maintain the file_one lines order.
It's possible to work in a line by line check mode, something like:
while read line; do
if ! grep $line two; then
echo $line >> one_only
fi
done < one
But it likely isn't the best option for a fast check. An other way would be using "comm" command on the files previously sorted:
comm -1 -2 <(sort one) <(sort two) \
| tr '\n' '\|' \
| sed 's/|/\\|/g;s/\\|$/\n/' \
> common_lines
grep -v "$(cat common_lines)" one
first command creates a "logical OR pattern":
a\|b\|c
which can be reuse with grep to exclude common lines from the "one" file. In this way the original order of "one" lines is preserved. And the result will be:
pl
ff
nn
Can you suggest any other idea to minimize compute time even more?
Real input files are much more populated and contain short names (software names):
blender
gimp
vim
emacs
mozilla-firefox
google-earth
and so on...
CodePudding user response:
Using the lines of two
as patterns, with grep
grep -Fx -f two -v one
CodePudding user response:
awk 'NR==FNR{a[$0]; next} !($0 in a)' two one
Regarding the while read
loop in your question, please read why-is-using-a-shell-loop-to-process-text-considered-bad-practice