Home > Software design >  Bash String Format Comparison with Wildcards
Bash String Format Comparison with Wildcards

Time:09-16

I am fairly new to bash scripting and was trying to echo only lines that match a specific formatting. I have this code so far:

LINE=1
while read -r CURRENT_LINE
    do 
        if [[ $CURRENT_LINE == ??-?-??? ]]
        then
            echo "$LINE: $CURRENT_LINE"
        fi    
        ((LINE  ))
done < "./new-1.txt"

The text file contains number sequences on each line that match the following format: "12-3-456", but also contains sequences that are in different formats as well, such as "123-89203-9420" or "123-456-7890". I can't quite understand why the if statement inside the while loop does not result to True on lines that match the formatting. I've tried using the * as well, but using it gives me incorrect results.

Here are the contents of the text file new-1.txt. I want the script to output "Line 1: 11-1-111", but it doesn't output anything.

11-1-111
222-22-2222
333-33-3333
444-444-4444
555-555-5555

CodePudding user response:

Maybe try using the bash regex operator (=~), e.g.

while read -r CURRENT_LINE;
do
    if [[ $CURRENT_LINE =~ [0-9][0-9]-[0-9]-[0-9][0-9][0-9] ]]
        then
        echo "$LINE: $CURRENT_LINE"
    fi
    ((LINE  ))
done < new-1.txt

Or, if you are open to alternatives, using nl (number the lines) and sed:

nl -n ln new-1.txt | sed -n '/[[:digit:]]\{2\}-[[:digit:]]-[[:digit:]]\{3\}/p'

Or with GNU grep:

grep -noP "[0-9][0-9]-[0-9]-[0-9][0-9][0-9]" new-1.txt

CodePudding user response:

In the regex parlance, the ? makes the character or selection optional, ie , a character/selection is allowed to occur at most one time but zero occurrences are also tolerated.

However, the == operation is not the regex matching operator. It is =~.

So changing your if clause to the below would do the job.

[[ $CURRENT_LINE =~ "^[0-9]{2}-[0-9]{1}-[0-9]{3}$" ]]

Here

  • The ^ specifies the beginning of regex and $ the end. So we have a tight coupling of the pattern to match
  • [0-9] denotes a range, here any number from zero to nine.
  • The {n} mandates that the preceding character/selection should match exactly n number of times

Note : You can also use a more verbose [[:digit:]] instead of [0-9]

  • Related