Home > Software design >  Bash regex pattern match syntax with space?
Bash regex pattern match syntax with space?

Time:02-12

Inputs examples:

raw_list='
libreoffice-impress.desktop - LibreOffice Impress
joplin.desktop - Joplin Notes
libreoffice-base.desktop - LibreOffice Base
yelp.desktop - Help
org.gnome.gedit.desktop - Text Editor
'

I want to parse out the app .desktop file from the name. example:

name='Joplin Notes'
path='joplin.desktop'

My regex:

parse_app_list(){
    name=''
    path=''
    for i in "${raw_list[@]}"; do
        echo "$i"
        [[ $name =~  "$i.desktop[:space:]-[:space:].*" ]]
        [[ $path =~  ".*$i.desktop" ]]

        echo "$name" 
        echo "$path"
    done
}

It's not even close. What would the correct syntax be?

CodePudding user response:

When using =~ as a test operator, the bash manual states:

<snip> the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)) <snip> Any part of the pattern may be quoted to force the quoted portion to be matched as a string

This means that

[[ "$var" =~ regex ]]       # matches regex
[[ "$var" =~ "string" ]]    # matches string

In case of the OP, the test should read:

[[ $name =~  "$i.desktop"[[:space:]]-[[:space:]].* ]]
[[ $path =~  ".*"$i.desktop" ]]

Here we did the following modifications:

  • unquote the entire regular expression to interpret it as a regex and not a string
  • quote the string "$i.desktop" to have it interpreted as a string. Otherwise any <dot>-character or other special regex character in $i could be interpreted as a regex.
  • [:space:] is a character class and should be located in a bracket expression (i.e. [[:space:]], [:space:] just matches any of the following characters :aceps)

CodePudding user response:

A few issues with the current code:

  • raw_list=' libreoffice-impress.destop ... Text Editor' is a variable containing one multi-line string; raw_list is not an array of path - name pairs; though we do have the special case where "${raw_list}" can be referenced as "${raw_list[0]}"
  • "${raw_list[@]}" is an array reference; the loop will be processed once with i="${raw_list}" (or i="${raw_list[0]}"); try running for i in "${raw_list[@]}";do echo "loop:$i"; done to confirm this
  • the variables name and path are never set to anything (other than '') so the tests will always fail

Assumptions:

  • initial data can be reformatted as array entries
  • each array entry contains a single instance of [:space:]-[:space:]

Setup:

raw_list=(
'libreoffice-impress.desktop - LibreOffice Impress'
'joplin.desktop - Joplin Notes'
'libreoffice-base.desktop - LibreOffice Base'
'yelp.desktop - Help'
'org.gnome.gedit.desktop - Text Editor'
)

regex='(.*) - (.*)'     # whatever matches the contents inside the parens will be
                        # our 1st and 2nd entries in the `BASH_REMATCH[]` array

One idea using bash regex matching to parse the pairs for us:

while read -r line
do
    path=''
    name=''

    [[ "${line}" =~ $regex ]] && \
    path="${BASH_REMATCH[1]}" && \
    name="${BASH_REMATCH[2]}"

    echo "############## ${line}"
    echo "path=${path}"
    echo "name=${name}"
    echo ""
done < <(printf "%s\n" "${raw_list[@]}")

Another idea using parameter expansions to parse the pairs for us:

for line in "${raw_list[@]}"
do
    path="${line% - *}"
    name="${line#* - }"

    echo "############## ${line}"
    echo "path=${path}"
    echo "name=${name}"
    echo ""
done

Both of these generate:

############## libreoffice-impress.desktop - LibreOffice Impress
path=libreoffice-impress.desktop
name=LibreOffice Impress

############## joplin.desktop - Joplin Notes
path=joplin.desktop
name=Joplin Notes

############## libreoffice-base.desktop - LibreOffice Base
path=libreoffice-base.desktop
name=LibreOffice Base

############## yelp.desktop - Help
path=yelp.desktop
name=Help

############## org.gnome.gedit.desktop - Text Editor
path=org.gnome.gedit.desktop
name=Text Editor
  • Related