Home > Enterprise >  Terminal SED regex fails with dash and slash
Terminal SED regex fails with dash and slash

Time:06-13

I try to convert filenames and remove special chars and whitespaces. For some reasons my SED regex don't work if I declare dash and slashes not to be replaced.

Example:

echo "/path/to/file 20-456 (1).jpg" | sed -e 's/ /_/g' -e 's/[^0-9a-zA-Z\.\_\-\/]//g'

Output:

/path/to/file_20456_1.jpg

So the dash isn't in. When I try this command:

echo "/path/to/file 20-456 (1).jpg" | sed -e 's/ /_/g' -e 's/[^0-9a-zA-Z\.\_\-]//g'

Output:

pathtofile_20-456_1.jpg

the dash is there but without the directory slashes I can't move the files. I wonder why the replacment with dash didn't work anymore if I add \/ into regex pattern.

Any suggestions?

CodePudding user response:

With your shown samples and attempts, please try following awk code.

echo "/path/to/file 20-456 (1).jpg" | 
awk 'BEGIN{FS=OFS="/"} {gsub(/ /,"_",$NF);gsub(/-|\(|\)/,"",$NF)} 1'

Explanation: Simple explanation would be, by echo printing value /path/to/file 20-456 (1).jpg as a standard input to awk program. In awk program, setting FS and OFS to / in BEGIN section. Then in main program using gsub to globally substitute space with _ in last field($NF) and then globally substitute - OR ( OR ) with NULL in last field and then mentioning 1 will print that line.

CodePudding user response:

You may get the result using string manipulation in Bash:

#!/bin/bash
path="/path/to/file 20-456 (1).jpg"
fldr="${path%/*}"   # Get the folder
file="${path##*/}"  # Get the file name
file="${file// /_}" # Replace spaces with underscores in filename
echo "$fldr/${file//[^[:alnum:]._-]/}" # Get the result

See the online demo yielding /path/to/file_20-456_1.jpg.

Quick notes:

  • ${path%/*} - Removes the smallest chunk up to / from the end of the path
  • ${path##*/} - Removes the largest text chunk from start of path to last / (including it)
  • ${file// /_} replaces all spaces with _ in file
  • ${file//[^[:alnum:]._-]/} removes all chars that are not alphanumeric, ., _ and - from file.
  • Related