Home > Enterprise >  Retain original suffix in piped command chain
Retain original suffix in piped command chain

Time:09-05

I have a script that searches for files that have duplicate main part of filename:

E.g. "IMAGE123.MOV" and "IMAGE123.JPG".

The script looks like this:

find . -iname "*.mov" -o -iname "*.jpg" | cut -d'.' -f2 | uniq -d |uniq | awk '{print "."$1".mov"}'

I could pipe this to "rm" with xargs to delete the files. But the issue is that I do not know if the file had ".MOV" or ".mov" suffix. I use awk to concatenate lowercase - but I am not sure if that was correct. This information gets lost during the command chain here.

Any ideas?

CodePudding user response:

I think it's easier to find *.mov and test if there is a corresponding *.jpg. Here is how I'd do it:

find . -name '*.[Mm][Oo][Vv]' -exec sh -c '
for mov; do
  for jpg in "${mov%.*}".[Jj][Pp][Gg]; do
    if test -f "$jpg"; then
      echo rm "$mov"
    fi
    break
  done
done' sh {}  

Remove echo if the output looks good.

CodePudding user response:

Ironically, sort -u is more powerful than uniq. You can use it without cut.

find . -iname "*.mov" -o -iname "*.jpg" |
sort -f -u -k1,1 -t.

The parameters of sort mean the following:

  • -f Ignore case
  • -u Unique
  • -k1,1 Consider only the first field for uniqueness/sorting
  • -t. Consider the dot as the field separator

This will return the first ocurrence of each filename, no matter the case or the file extension.

  • Related