Home > Net >  Return text from XML query - Python
Return text from XML query - Python

Time:10-14

I need the return of an XML query with SOAP to be in XML format, but it only returns a line of text without indentation, and with the tags (< and >) changed to &alt and & gt

import requests

url = ''
xml = """<?xml version=\"1.0\" encoding=\"utf-8\"?>
            <soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:ss2=\"http://tempuri.org/SS20WS\">
               <soapenv:Header/>
               <soapenv:Body>
                  <ss2:Consulta>
                     <ss2:XML><![CDATA[<CONFIE><CONSULTA DOCUMENTO="XXXXXXX" UF=""
                                LOGIN="XXXXXX" SENHA="XXXXX" TIPOCONSULTA="1" TIPORELATORIO="1"
                                TIPORETORNO="1" TIPODOCUMENTO="1" TIMEOUT="20000" DIAS="180" HTML="N" RECEITA="S">
                                </CONSULTA><MONITORE PERIODOMONIT="" EMAILMONIT=""
                                REFERENCIAMONIT=""></MONITORE></CONFIE>]]>
                     </ss2:XML>
                  </ss2:Consulta>
               </soapenv:Body>
            </soapenv:Envelope>"""
headers = {'Content-Type': 'text/xml'}

r = requests.post(url, data=xml, headers=headers)

print(r.text)

Excerpt from the response I receive:

&lt;NO_MUNICIPIO&gt;JAU&lt;/NO_MUNICIPIO&gt;&lt;CEP&gt;17210170&lt;/CEP&gt;&lt;IBGE&gt;3525300&lt;/IBGE&gt;&lt;IBGE-DISTRITO&gt;

CodePudding user response:

Not sure if this is the "right" way to do this but it seems to work:

from lxml import etree

CONTROL = {
    '&amp;': '&',
    '&#38;': '&',
    '&#60;': '<',
    '&lt;': '<',
    '&#62;': '>',
    '&gt;': '>',
    '&#39;': "'",
    '&apos;': "'",
    '&#34;': '"',
    '&quot;': '"',
    '&nbsp;': ' '
}
def fix_xml(xml):
    for k, v in CONTROL.items():
        xml = xml.replace(k, v)
    return xml

XML = """&lt;head&gt;&lt;/head&gt;"""

fxml = fix_xml(XML)
print(fxml)
etree.fromstring(fxml)

Note:

The etree import is used here just to validate the modified XML

Output:

<head></head>

CodePudding user response:

The extract you posted doesn't explain clearly where the response is coming from, but what you could do is parse the string to make it an ElementTree and navigate it as that:

import xml.etree.ElementTree as ET

def process_xml(value):
    value = value.replace('&gt;', '>')
    value = value.replace('&lt;', '<')
    value = f"<xml>{value}</xml>"
    return ET.fromstring(value)

process_xml(r.text)

CodePudding user response:

The response you're receiving has been "double-escaped". At some stage it was a normal XML message <NO_MUNICIPIO>JAU</NO_MUNICIPIO> and then someone inserted this XML message as text into a wrapper XML document, which caused it to be escaped with the &lt;...&gt; stuff. Now, that might or might not be a good design, but what matters now is how the text is extracted. If you parse the wrapper document using an XML parser, you will find there is a text node containing the unescaped content, <NO_MUNICIPIO>JAU</NO_MUNICIPIO>, and you can pass this into a second XML parser to process it. If you're seeing the &lt;...&gt; escape sequences, that means that somewhere along the line, someone is reading XML without using an XML parser, because an XML parser will always convert &lt; to <. And reading XML without using an XML parser is always bad news.

  • Related