I have a file which contains html response code as shown below:
<d:ChangeType>SSCR</d:ChangeType>
<d:Status>Success</d:Status>
<d:ShortDescription>API </d:ShortDescription>
<d:CycleTypeId>8000006005</d:CycleTypeId>
<d:RfcNumber>1200000910</d:RfcNumber>
<d:ExtrefNumber>API External</d:ExtrefNumber>
Requirement is to fetch number between <d:RfcNumber>
and </d:RfcNumber>
i.e.1200000910
in this example and feed it to a variable.
I'm trying to use sed like:
sed 's/1200000910.*//' test2.html
but it is not providing me the expected result.
Any help on this will be appreciated.
CodePudding user response:
1st solution: Since you are using sed
assuming you are using shell here. You could use awk
command here. Simply using awk
and setting field delimiter(s) as <d:RfcNumber>
and <\\/d:RfcNumber>
for all lines. In main program checking if number of fields are greater than 2 then printing the 2nd field.
var=$(awk -F'<d:RfcNumber>|<\\/d:RfcNumber>' 'NF>2{print $2;exit}' Input_file)
2nd solution: Using GNU awk
's match
function here to get values between tags.
var=$(awk 'match($0,/^<d:RfcNumber>([^<]*)<\/d:RfcNumber>/,arr){print arr[1];exit}' Input_file)
3rd solution: OR with sed
please try following code, with using -E
option of GNU sed
to use ERE(extended regular expression) in code.
var=$(sed -E -n 's/^<d:RfcNumber>([^<]*)<\/d:RfcNumber>/\1/p' Input_file)
CodePudding user response:
...the split way:
#Read your file content
$response = (gc C:\tmp\testdata.txt)
$rfc = ($response -split "<d:RfcNumber>|</d:RfcNumber>")[5]
or regex (but this could probably be optimized):
#Read your file content
$response = (gc C:\tmp\testdata.txt)
($response | select-string "<d:RfcNumber>\d{10}").matches.groups.value -replace "<d:RfcNumber>"
or you use it as xml:
[xml]$xml = '<root>' (($response -replace "d:") -join $null) '</root>'
$xml.root.RfcNumber