I have a script that searches for files that have duplicate main part of filename:
E.g. "IMAGE123.MOV" and "IMAGE123.JPG".
The script looks like this:
find . -iname "*.mov" -o -iname "*.jpg" | cut -d'.' -f2 | uniq -d |uniq | awk '{print "."$1".mov"}'
I could pipe this to "rm" with xargs to delete the files. But the issue is that I do not know if the file had ".MOV" or ".mov" suffix. I use awk to concatenate lowercase - but I am not sure if that was correct. This information gets lost during the command chain here.
Any ideas?
CodePudding user response:
I think it's easier to find *.mov and test if there is a corresponding *.jpg. Here is how I'd do it:
find . -name '*.[Mm][Oo][Vv]' -exec sh -c '
for mov; do
for jpg in "${mov%.*}".[Jj][Pp][Gg]; do
if test -f "$jpg"; then
echo rm "$mov"
fi
break
done
done' sh {}
Remove echo
if the output looks good.
CodePudding user response:
Ironically, sort -u
is more powerful than uniq
.
You can use it without cut
.
find . -iname "*.mov" -o -iname "*.jpg" |
sort -f -u -k1,1 -t.
The parameters of sort mean the following:
-f
Ignore case-u
Unique-k1,1
Consider only the first field for uniqueness/sorting-t.
Consider the dot as the field separator
This will return the first ocurrence of each filename, no matter the case or the file extension.