Retain original suffix in piped command chain-CodePudding

I have a script that searches for files that have duplicate main part of filename:

E.g. "IMAGE123.MOV" and "IMAGE123.JPG".

The script looks like this:

find . -iname "*.mov" -o -iname "*.jpg" | cut -d'.' -f2 | uniq -d |uniq | awk '{print "."$1".mov"}'

I could pipe this to "rm" with xargs to delete the files. But the issue is that I do not know if the file had ".MOV" or ".mov" suffix. I use awk to concatenate lowercase - but I am not sure if that was correct. This information gets lost during the command chain here.

Any ideas?

CodePudding user response：

I think it's easier to find *.mov and test if there is a corresponding *.jpg. Here is how I'd do it:

find . -name '*.[Mm][Oo][Vv]' -exec sh -c '
for mov; do
  for jpg in "${mov%.*}".[Jj][Pp][Gg]; do
    if test -f "$jpg"; then
      echo rm "$mov"
    fi
    break
  done
done' sh {}

Remove echo if the output looks good.

CodePudding user response：

Ironically, sort -u is more powerful than uniq. You can use it without cut.

find . -iname "*.mov" -o -iname "*.jpg" |
sort -f -u -k1,1 -t.

The parameters of sort mean the following:

-f Ignore case
-u Unique
-k1,1 Consider only the first field for uniqueness/sorting
-t. Consider the dot as the field separator

This will return the first ocurrence of each filename, no matter the case or the file extension.