I'm trying to get just the text "Passwords do not match" between <Description>
and </Description>
from the variable $webout using regex. I'm brand new to regex, so please explain in detail the solution and how to format it within the bash script so I can learn.
Text from $webout Variable:
<?xml version="1.0" encoding="utf-16"?><interface-response><Command>SETDNSHOST</Command><Language>eng</Language><ErrCount>1</ErrCount><errors><Err1>Passwords do not match</Err1></errors><ResponseCount>1</ResponseCount><responses><response><Description>Passwords do not match</Description><ResponseNumber>304156</ResponseNumber><ResponseString>Validation error; invalid ; password</ResponseString></response></responses><Done>true</Done><debug><![CDATA[]]></debug></interface-response>
Script:
#!/bin/bash
url=ifconfig.me
pip=$(curl -s ${url})
upip="https://dynamicdns.park-your-domain.com/update?host=[hostname]&domain=[domain.com]&password=[password]&ip=${pip}"
webout=$(curl -s $upip)
echo $webout(<Description>(.*?)<)
#echo $(date '%D %H:%M') $pip >> /users/username/documents/itworks.txt
The problems i've ran into I believe is caused by the "/" in </Description>
. That and i'm having a very difficult time grasping regex formatting.
Thank you
CodePudding user response:
With xmlstarlet
# this is a placeholder for your curl call
webout='<?xml version="1.0" encoding="utf-16"?><interface-response><Command>SETDNSHOST</Command><Language>eng</Language><ErrCount>1</ErrCount><errors><Err1>Passwords do not match</Err1></errors><ResponseCount>1</ResponseCount><responses><response><Description>Passwords do not match</Description><ResponseNumber>304156</ResponseNumber><ResponseString>Validation error; invalid ; password</ResponseString></response></responses><Done>true</Done><debug><![CDATA[]]></debug></interface-response>'
desc=$(
echo "$webout" \
| iconv -f utf-8 -t utf-16 \
| xmlstarlet sel -t -v //Description
)
declare -p desc
outputs
declare -- desc="Passwords do not match"
iconv
was needed to avoid "Document labelled UTF-16 but has UTF-8 content" error (from copy-pasting your sample data, YMMV)
CodePudding user response:
Here's two bash examples:
webout='<?xml version="1.0" encoding="utf-16"?><interface-response><Command>SETDNSHOST</Command><Language>eng</Language><ErrCount>1</ErrCount><errors><Err1>Passwords do not match</Err1></errors><ResponseCount>1</ResponseCount><responses><response><Description>Passwords do not match</Description><ResponseNumber>304156</ResponseNumber><ResponseString>Validation error; invalid ; password</ResponseString></response></responses><Done>true</Done><debug><![CDATA[]]></debug></interface-response>'
sed -n "s:.*<Description>\(.*\)</Description>.*:\1:p" <<< $webout
grep -oP '(?<=<Description>).*(?=</Description>)' <<< $webout
To be honest, the second command (grep
) is using a syntax that I'm not too familiar with (I just picked it up here: https://unix.stackexchange.com/questions/13466/can-grep-output-only-specified-groupings-that-match ).
However, when you are parsing XML, you are better off not using regex but rather an xml parser.
Here's a third option using an XML parse (xmllint):
xmllint --xpath '//Description/text()' - <<< $webout
Note: I had to change utf-16 to utf-8 to make xmllint happy.
After I read comments and other answers, on this page, I discovered that iconv is the command for converting from UTF-8 to UTF-16. Here's an improved version:
xmllint --xpath '//Description/text()' <( iconv -f utf-8 -t utf-16 <<< $webout )