trying to do a regex replacement with a lookahead (thus awk
and not sed
) that removes all dots save the last one to preserve the extension eg: (my.big.file.avi > my-big-file.avi)
. here's my little bash script:
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
newFile=$(printf $file | awk '{gsub(/\.(?=.*?\.)/"-");}1')
#ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
printf "${file} ---> ${newFile}\n"
done
this gives me a regular expression compile failed (missing operand)
error...
i can't see it. can someone point me to my mistake?
CodePudding user response:
You don't need awk, or regular expressions, for any part of solving this problem; parameter expansion suffices.
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
dirname=${file%/*} # we don't want to change the directory name
filename=${file##*/} # so split out just the filename
[[ $filename = *.*.* ]] || continue # no compound extension? do nothing
file_start=${filename%.*} # content up to last dot
file_ext=${filename##*.} # content after last dot
newFile=${dirname}/${file_start//./-}.${file_ext} # combine the two
# okay, got what we need, now we can work with it
#ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
printf '%s ---> %s\n' "$file" "$newFile"
done
But if you want to use regular expressions:
#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
[[ $file =~ ^(.*)/([^/] )[.]([^/.] )$ ]] || continue
dirname=${BASH_REMATCH[1]}
file_start=${BASH_REMATCH[2]}
file_ext=${BASH_REMATCH[3]}
newFile=${dirname}/${file_start//./-}.${file_ext}
printf '%s ---> %s\n' "$file" "$newFile"
done
CodePudding user response:
And an alternative nowhere near as elegant as Charles', but maybe also does the job ...
echo my.big.file.avi | sed -E 's/\./-/g;s/-([^-] )$/.\1/'
my-big-file.avi
CodePudding user response:
GNU AWK
has limited supported for lookaheads, namely $
for end of line and \>
for end of word. Your task, namely
removes all dots save the last one to preserve the extension eg:
(my.big.file.avi > my-big-file.avi)
might be accomplished using GNU AWK
's functions for working with strings, I would do it as follows, let file.txt
content be
my.big.file.avi
i-do-not-need-change.mp3
name-without-dot
then
awk '{match($0,/[.][^.]*$/); print gensub(/[.]/,"-","g",substr($0,1,RSTART-1)) substr($0,RSTART)}' file.txt
output
my-big-file.avi
i-do-not-need-change.mp3
name-without-dot
Note: I added 2 test cases. Explanation: Firstly use match
to look for literal dot ([.]
) followed by zero or more (*
) not-dots ([^.]
) and followed by end of line ($
). This will set RSTART
to position of last dot in line. Then I use substr
to get part before last dot and part with last dot and following character. In 1st part I replace all dots with -, in 2nd I do nothing, then concatenate them and print
. If you want to know more about functions I used read String Functions docs.
(tested in GNU Awk 5.0.1)
Keep in mind some file have 2 dots in extension, for example file.tar.gz
, my solution does not that into account.
(thus
awk
and notsed
)
Scary warning: sed
is Turing complete. Ramifcation: it can do anything other Turing language can accomplish. That being said that it can does mean you should use it.