Home > database >  Get string between two regex
Get string between two regex

Time:02-24

I have a string that is like that(comes from a curl command's body): The server returns an xml

Package installed in 116ms.
  </log>
</data>
<status code="200">ok</status>

Sometimes this strong can become or some other error:

<status code="500">java.lang.IllegalStateException: Archive not valid.</status>

I'm wonder what would be the best way to grep the status code 200 and if it is not 200. Give an error like "Status code is 500 with error: java.lang.IllegalStateException: Archive not valid

So far I have some improvement:

sed -n "s|<status code="200">(.*)|\1|p" test.log

But how do I get the status code too with sed

CodePudding user response:

To get attribute code of node status, just:

xmllint --xpath 'string(//status/@code)' -

CodePudding user response:

Assumptions:

  • (by whatever means) the <status>...</status> string has been extracted from the curl output
  • the string is formatted as in OP's examples (eg, no embedded line breaks between the <string> and </string> strings; no double quotes, left/right carrots in the data; nothing that would require a more complex regex pattern than used below)

One idea would be to compare the string against a regex and if there's a match then obtain the desired info from the BASH_REMATCH[] array, eg:

regex='<status code="([^"] )">([^<] )<'

for string in '<status code="200">ok</status>' '<status code="500">java.lang.IllegalStateException: Archive not valid.</status>' 'ignore this string'
do
    unset error_no error_msg

    printf "\n######## %s\n\n" "${string}"

    if [[ "${string}" =~ $regex ]]
    then
        error_no="${BASH_REMATCH[1]}"
        error_msg="${BASH_REMATCH[2]}"
    fi

    typeset -p BASH_REMATCH

    printf "\nerror_no  : ${error_no}\n"
    printf "error_msg : ${error_msg}\n"
done

This generates:

######## <status code="200">ok</status>

declare -ar BASH_REMATCH=([0]="<status code=\"200\">ok<" [1]="200" [2]="ok")

error_no  : 200
error_msg : ok

######## <status code="500">java.lang.IllegalStateException: Archive not valid.</status>

declare -ar BASH_REMATCH=([0]="<status code=\"500\">java.lang.IllegalStateException: Archive not valid.<" [1]="500" [2]="java.lang.IllegalStateException: Archive not valid.")

error_no  : 500
error_msg : java.lang.IllegalStateException: Archive not valid.

######## ignore this string

declare -ar BASH_REMATCH=()

error_no  :
error_msg :

NOTES:

  • the regex variable could likely be expanded to address additional formats but at some point you start re-inventing an XML parser
  • Related