I have two lists list1
and list2
with a filename on each line. I want a result
with all filenames that are only in list2
and not in list1
, regardless of specific file extensions (but not all). Using Linux bash, any commands that do not require any extra installations. In the example lists, I do know all file extensions that I wish to ignore. I made an attempt but it does not work at all, I don't know how to fix it. Apologies for my inexperience.
I wish to ignore the following extensions: .x .xy .yx .y .jpg
list1.txt
text.x
example.xy
file.yx
data.y
edit
edit.jpg
list2.txt
text
rainbow.z
file
data.y
sunshine
edit.test.jpg
edit.random
result.txt
rainbow.z
sunshine
edit.test.jpg
edit.random
My try:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Edit: I forgot two requirements. The filenames can have . in them and not all filenames must have an extension. I know the extensions that must be ignored. I ammended the lists accordingly.
CodePudding user response:
An awk
solution might be more efficient for this task:
awk '
{ f=$0; sub(/\.(xy?|yx?|jpg)$/,"",f) }
NR==FNR { a[f]; next }
!(f in a)
' list1.txt list2.txt > result.txt
CodePudding user response:
comm
can do precisely this.
You can preprocess the input:
- strip the suffices
- sort (
comm
expects sorted input) - remove duplicates
ss()( sed 's/\.\(x\|xy\|yx\|y\|jpg\)$//' "$@" | sort -u )
comm -13 <(ss list1.txt) <(ss list2.txt) >result.txt
Your code was:
while read LINE
do
line2=$LINE
sed -i 's/\.x$//g' $LINE $line2
sed -i 's/\.xy$//g' $LINE $line2
sed -i 's/\.yx$//g' $LINE $line2
sed -i 's/\.y$//g' $LINE $line2
then sed -i -e '$line' result.txt;
fi
done < list2.txt
Some issues that immediately jump out:
- syntax error -
then
/fi
but no matchingif
- you never access
list1
- you don't quote variables when you use them, so whitespace and special characters will cause problems
while read ... sed ... sed ... sed ...
is inefficient - multiple invocations of sed instead of just one, and a loop that sed would perform implicitlysed
expects file arguments not stringssed -i
will try to overwrite input file arguments- you use
result.txt
as both input and output to sed but never assign any contents to it - you try to use data (
$line
) as sed commands, instead of applying sed commands to that data - because you used single-quotes,
sed -i -e '$line'
will attempt to run a (non-existent) sed commandline
on the last line of input ($
) g
option tos///
does nothing when search is anchored
CodePudding user response:
I'd use join
:
$ join -t. -j1 -v2 -o 2.1,2.2 <(sort list1.txt) <(sort list2.txt) | sed 's/\.$//'
rainbow.z
sunshine
(The bit of sed
is needed to turn sunshine.
into sunshine
)