Home > Software engineering >  How to extract multiple substrings surrounded by double-quotes from a longer string
How to extract multiple substrings surrounded by double-quotes from a longer string

Time:02-17

I am trying to process the output of another script that looks a little something like this:

xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]

What I want to do is to be able to find the first substring surrounded by quotes, confirm the value (i.e. "ABCD") and then take all the remaining substrings (there is a variable number of substrings) and put them in an array.

I've been looking around for the answer to this but the references I've been able to find involve just extracting one substring and not multiples.

CodePudding user response:

This Shellcheck-clean demonstration program shows a way to do it with Bash's own regular expression matching ([[ str =~ regex ]]):

#! /bin/bash -p

input='xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'

# Regular expression to match strings with double quoted substrings.
# The first parenthesized subexpression matches the first string in quotes.
# The second parenthesized subexpression matches the entire portion of the
# string after the first quoted substring.
quotes_rx='^[^"]*"([^"]*)"(.*)$'

if [[ $input =~ $quotes_rx ]]; then
    if [[ ${BASH_REMATCH[1]} == ABCD ]]; then
        tmpstr=${BASH_REMATCH[2]}
    else
        echo "First quoted substring is not 'ABCD'" >&2
        exit 1
    fi
else
    echo 'Input does not contain any quoted substrings' >&2
    exit 1
fi

quoted_strings=()
while [[ $tmpstr =~ $quotes_rx ]]; do
    quoted_strings =( "${BASH_REMATCH[1]}" )
    tmpstr=${BASH_REMATCH[2]}
done

declare -p quoted_strings

CodePudding user response:

This awk tests for the content between the first pair of " characters, and extracts everything between subsequent pairs.

awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i =2) print $i}'

To populate a bash array, you could use mapfile and process substitution:

mapfile -t arr < <( … )

Testing:

mapfile -t arr < <(
  awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i =2) print $i}' \
  <<< 'xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
)
printf '%s\n' "${arr[@]}"
EFGH
IJKL
MNOP
QRST
UVWX
YZ12
  • Related