Stripping the leading zeros but leave a single 0-CodePudding

So let me start of by saying that I'm new to bash so I would appreciate a simple explanation on the answers you give.

I've got the following block of code:

name="Chapter 0000.cbz (sub s2)"

s=$(echo $name | grep -Eo '[0-9] ([.][0-9] )?' | tr '\n' ' ' | sed 's/^0*//')

echo $s

readarray -d " " -t myarr <<< "$s"

if [[ $(echo "${myarr[0]} < 100 && ${myarr[0]} >= 10" | bc) -ne 0 ]]; then
    myarr[0]="0${myarr[0]}"
elif [[ $(echo "${myarr[0]} < 10" | bc) -ne 0 ]]; then
    myarr[0]="00${myarr[0]}"
fi

newName="Chapter ${myarr[0]}.cbz"

echo $newName

which (in this case) would end up spitting out:

 2
(standard_in) 1: syntax error
(standard_in) 1: syntax error
Chapter .cbz

(I'm fairly certain that the syntax errors are because ${myarr[0]} is null when doing the comparisons)

This is not the output I want. I want the code to strip leading 0's but leave a single 0 if its all 0.

So the code to really change would be sed 's/^0*//') but I'm not sure how to change it.

(expected outputs:

              in   --->   out
1) chapter 8.cbz   ---> Chapter 008.cbz
2) chapter 1.3.cbz   ---> Chapter 001.3.cbz
3) _23 (sec 2).cbz   ---> Chapter 023.cbz
4) chapter 00009.cbz   ---> Chapter 009.cbz
5) chap 0000112.5.cbz   ---> Chapter 112.5.cbz

so far the code I got works for 1- 3 but not the leading 0 cases (4 -5 ))

CodePudding user response：

I think you could implement the table of results by sed alone:

sed '
    s/^[^0-9]*/000/
    s/[^0-9.].*$/./
    s/\.*$/.cbz/
    s/^0*\([0-9]\{3\}\)/Chapter \1/
' <<'EOD'
chapter 8.cbz
chapter 1.3.cbz
_23 (sec 2).cbz
chapter 00009.cbz
chap 0000112.5.cbz
chap 04567.cbz
EOD

The first command strips everything before the first number and prepends zeros to ensure there are at least three digits.
The second command replaces everything after the number with a single period.
Because the number may contain a period but may also be followed by a period, a third command replaces all the trailing periods with the desired extension.
The final command removes the longest run of leading zeroes that leaves (at least) three digits (I added an extra test case to demonstrate).

Result of running this would be:

Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz
Chapter 4567.cbz

CodePudding user response：

In pure bash:

#!/bin/bash

for name in 'chapter 8.cbz' 'chapter 1.3.cbz' '_23 (sec 2).cbz' 'chapter 00009.cbz' 'chap 0000112.5.cbz'; do

##### The relevant part #####

[[ $name =~ ^[^0-9]*([0-9] )[^.]*(\..*)$ ]]

chapter=$(( 10#${BASH_REMATCH[1]} ))
suffix=${BASH_REMATCH[2]}

printf 'Chapter d%s\n' "$chapter" "$suffix"

#############################

done

Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz

notes:

[[ =~ ]] is the way to use an ERE regex in bash. The one that I wrote has two capturing groups: the first one captures the first appearing sequence of digits (which should be the chapter number), and the second one, all the characters that appear after the first dot (included).
$(( 10#... )) converts a zero prefixed decimal to a normal decimal.
printf 'd' converts a number to a decimal of at least 3 digits, padding the left with zeros when it's not the case.

CodePudding user response：

Using sed

$ sed 's/[^0-9]*0\ \?\([0-9]\{1,\}\)[^.]*\(\..*\)/Chapter 00\1\2/;s/0\ \([0-9]\{3,\}\)/\1/' file
Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz

s/[^0-9]*0\ \?$[0-9]\{1,\}$[^.]*$\..*$/Chapter 00\1\2/ - Strip everything up to a digit that is not zero, then add Chapter at the beginning as well as 2 zero after stripping the initial zeros.

s/0\ $[0-9]\{3,\}$/\1/ - Once again, strip excess zeros ensuring only three digits before the period remain.

CodePudding user response：

Here is an awk script that does the trick:

script.awk

{
  str = "000" gensub("(^[[:digit:]] \\.?[[:digit:]]*)( \\([^)] \\))?(\\.cbz)", "\\1", "g", RT);
  str = gensub("(^[[:digit:]] )([[:digit:]]{3})(.*$)", "\\2\\3", "g", str);
  printf("Chapter %s.cbz\n", str);
}

Test input.1.txt

1) chapter 8.cbz   
2) chapter 1.3.cbz 
3) _23 (sec 2).cbz 
4) chapter 00009.cbz
5) chap 0000112.5.cbz

Output:

awk -f script.awk RS='[[:digit:]] [\\.]?[[:digit:]]*( \\([^)] \\))?\\.cbz' input.1.txt
Chapter 008.cbz
Chapter 001.3.cbz
Chapter 023.cbz
Chapter 009.cbz
Chapter 112.5.cbz