Home > database >  awk regex compile failed
awk regex compile failed

Time:11-05

trying to do a regex replacement with a lookahead (thus awk and not sed) that removes all dots save the last one to preserve the extension eg: (my.big.file.avi > my-big-file.avi). here's my little bash script:

#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
        newFile=$(printf $file | awk '{gsub(/\.(?=.*?\.)/"-");}1')
        #ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
        printf "${file} ---> ${newFile}\n"
done

this gives me a regular expression compile failed (missing operand) error...

i can't see it. can someone point me to my mistake?

CodePudding user response:

You don't need awk, or regular expressions, for any part of solving this problem; parameter expansion suffices.

#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
        dirname=${file%/*}    # we don't want to change the directory name
        filename=${file##*/}  # so split out just the filename
        [[ $filename = *.*.* ]] || continue  # no compound extension? do nothing
        file_start=${filename%.*}  # content up to last dot
        file_ext=${filename##*.}   # content after last dot
        newFile=${dirname}/${file_start//./-}.${file_ext} # combine the two
        # okay, got what we need, now we can work with it
        #ffmpeg -i "$newFile" -vcodec copy -acodec aac "${newFile%.*}_AAC.mp4"
        printf '%s ---> %s\n' "$file" "$newFile"
done

But if you want to use regular expressions:

#!/bin/bash
shopt -s globstar nullglob dotglob
for file in ./**/*.{mpg,mpeg,mkv,avi,mp4}; do
    [[ $file =~ ^(.*)/([^/] )[.]([^/.] )$ ]] || continue
    dirname=${BASH_REMATCH[1]}
    file_start=${BASH_REMATCH[2]}
    file_ext=${BASH_REMATCH[3]}
    newFile=${dirname}/${file_start//./-}.${file_ext}
    printf '%s ---> %s\n' "$file" "$newFile"
done

CodePudding user response:

And an alternative nowhere near as elegant as Charles', but maybe also does the job ...

echo my.big.file.avi | sed -E 's/\./-/g;s/-([^-] )$/.\1/'
my-big-file.avi

CodePudding user response:

GNU AWK has limited supported for lookaheads, namely $ for end of line and \> for end of word. Your task, namely

removes all dots save the last one to preserve the extension eg: (my.big.file.avi > my-big-file.avi)

might be accomplished using GNU AWK's functions for working with strings, I would do it as follows, let file.txt content be

my.big.file.avi
i-do-not-need-change.mp3
name-without-dot

then

awk '{match($0,/[.][^.]*$/); print gensub(/[.]/,"-","g",substr($0,1,RSTART-1)) substr($0,RSTART)}' file.txt

output

my-big-file.avi
i-do-not-need-change.mp3
name-without-dot

Note: I added 2 test cases. Explanation: Firstly use match to look for literal dot ([.]) followed by zero or more (*) not-dots ([^.]) and followed by end of line ($). This will set RSTART to position of last dot in line. Then I use substr to get part before last dot and part with last dot and following character. In 1st part I replace all dots with -, in 2nd I do nothing, then concatenate them and print. If you want to know more about functions I used read String Functions docs.

(tested in GNU Awk 5.0.1)

Keep in mind some file have 2 dots in extension, for example file.tar.gz, my solution does not that into account.

(thus awk and not sed)

Scary warning: sed is Turing complete. Ramifcation: it can do anything other Turing language can accomplish. That being said that it can does mean you should use it.

  • Related