I am trying to process the output of another script that looks a little something like this:
xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]
What I want to do is to be able to find the first substring surrounded by quotes, confirm the value (i.e. "ABCD") and then take all the remaining substrings (there is a variable number of substrings) and put them in an array.
I've been looking around for the answer to this but the references I've been able to find involve just extracting one substring and not multiples.
CodePudding user response:
This Shellcheck-clean demonstration program shows a way to do it with Bash's own regular expression matching ([[ str =~ regex ]]
):
#! /bin/bash -p
input='xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
# Regular expression to match strings with double quoted substrings.
# The first parenthesized subexpression matches the first string in quotes.
# The second parenthesized subexpression matches the entire portion of the
# string after the first quoted substring.
quotes_rx='^[^"]*"([^"]*)"(.*)$'
if [[ $input =~ $quotes_rx ]]; then
if [[ ${BASH_REMATCH[1]} == ABCD ]]; then
tmpstr=${BASH_REMATCH[2]}
else
echo "First quoted substring is not 'ABCD'" >&2
exit 1
fi
else
echo 'Input does not contain any quoted substrings' >&2
exit 1
fi
quoted_strings=()
while [[ $tmpstr =~ $quotes_rx ]]; do
quoted_strings =( "${BASH_REMATCH[1]}" )
tmpstr=${BASH_REMATCH[2]}
done
declare -p quoted_strings
- See mkelement0's excellent answer to How do I use a regex in a shell script? for information about Bash's regular expression matching.
CodePudding user response:
This awk
tests for the content between the first pair of "
characters, and extracts everything between subsequent pairs.
awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i =2) print $i}'
To populate a bash array, you could use mapfile
and process substitution:
mapfile -t arr < <( … )
Testing:
mapfile -t arr < <(
awk -v q="ABCD" -F'"' '$2==q{for (i=4; i<=NF; i =2) print $i}' \
<<< 'xxx "ABCD" xxx xxx ["EFGH","IJKL","MNOP","QRST","UVWX","YZ12"]'
)
printf '%s\n' "${arr[@]}"
EFGH
IJKL
MNOP
QRST
UVWX
YZ12