Home > Blockchain >  What is the most efficient way to receieve xml data until a delimiter is reached
What is the most efficient way to receieve xml data until a delimiter is reached

Time:12-31

Currently, I'm having an issue with a basic socket server. Essentially I have no control over the client for this server and the client is sending XML messages of an unknown length delimited by a known set of characters. Basic reproduction for this issue can be demonstrated with the following,

import socket
server_address = ('192.168.2.47', 10000)

#server

#client
def client():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(server_address)
    sock.send('<messageBody><ew32f/><dwadwa/></messageBody>')
    sock.send('<messageBody><dwaaw/><fewwfe/></messageBody>')
    sock.send('<messageBody><ewqf3x/><awdwad2/></messageBody>')
    # the socket will stay connected so long as the client continues sending data which could be days or more

def server():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind(server_address)
    client, addr = sock.accept(1)
    # I need to find a way to receive the client data such that it stops receiving the the </messageBody> tag

I'm trying to find the most efficient method possible of going about this as the server may be receiving several hundred messages per second from various clients. the size of these messages could be between a few bytes and several kilobytes.

CodePudding user response:

I think Python's expat parser can help you; it allows streaming parsing and the chunks can be fragments of XML (like bar in my example below).

I'm pretty sure I comprehend your issue and its context. Here's my attempt to show your server receiving this sample XML:

<root>
    <foo />
    <bar />
    <messageBody>
        <ewqf3x />
        <awdwad2 />
    </messageBody>
    <baz />
</root>

but in chunks, as if the client were feeding you the entire XML body over many calls. Each chunk is parsed, and when the <messageBody/> end-tag is read, an error is raised which is your signal that you have everything you need and can stop processing (listening?).

#!/usr/bin/env python3
import sys
from xml.parsers.expat import ParserCreate

class FoundMessageBodyEnd(Exception):
    pass

def end_element(name):
    print(f'Processing end-tag for {name}')
    if name == 'messageBody':
        # This may not be the right way to do this
        raise FoundMessageBodyEnd


p = ParserCreate()
p.EndElementHandler = end_element

streaming_chunks = [
    '''<root>
    <foo />
    <bar ''',  # notice that bar is not closed till the first line of the next chunk
    '''/>
        <messageBody>
        <ewqf3x />
        <awdwad2 />''',
    '''    </messageBody>''',
    '''    <baz />
</root>''',
]

parsed = 0
for chunk in streaming_chunks:
    try:
        p.Parse(chunk)
        parsed  = 1
    except FoundMessageBodyEnd:
        print(f'After parsing {parsed 1} chunks, found messageBody delimiter, done.')
        sys.exit(1)
  • Related