Home > Software engineering >  send a json containing a big buffer (bytearray) through sockets: gets truncated
send a json containing a big buffer (bytearray) through sockets: gets truncated

Time:07-22

I'm trying to send a json containing text fields and a buffer in a bytearray, from a micro-controller to a Windows server

msg = {"some_stuff": "some_stuff", "buf": bytearray(b'\xfe\xc2\xf1\xfe\xd5\xc0 ...')}

Note that the buffer is quite long (so that I can't put it here as reference) len(buf) -> 35973

I'm sending the length of the message before to the server so that it knows how long is the message to be received

def send_json(conn, msg):
    msg = json.dumps(msg).encode('utf-8')
    msg_length = len(msg)
    header = str(msg_length).encode('utf-8')
    header  = b' ' * (64 - len(header))
    conn.send(header)
    conn.send(msg)

The receiving function is then

def receive_json(conn) -> dict:
    msg_length = int(
        conn.recv(64).decode('utf-8').replace(' ', '')
    )
    msg_b = conn.recv(msg_length)
    msg_s = msg_b.decode('utf-8')
    try:
        msg_d = json.loads(msg_s)
    except:
        msg_d = eval(msg_s)
    return msg_d

The problem is that the received message is truncated.

msg_b = b'{"buf": bytearray(b\'\\xfe\\xc2\\xf1 ... \\x06u\\xd0\\xff\\xb'

It's worth mentioning that while in debug, if I stop for a while with a breakpoint on line msg_b = conn.recv(msg_length), before running it, the received message is complete.

So it seems that in the receiving function the conn.recv(msg_length) instruction does not wait to receive a message of the specified length (msg_length)

Why is it the case? What can I do to receive a complete message?

I could introduce time.sleep between receiving the length of the message and the message, but how to know how much to wait depending on the message length?

Thank you

CodePudding user response:

My solution was to check for how much of the message is missing and iterate till the message is complete

def receive_json(conn) -> dict:
    msg_length = int(
        conn.recv(64).decode('utf-8').replace(' ', '')
    )
    buf = bytearray(b'')
    while len(buf) < msg_length:
        missing_length = msg_length - len(buf)
        packet = conn.recv(missing_length)
        buf.extend(packet)
    msg_s = buf.decode('utf-8')
    try:
        msg_d = json.loads(msg_s)
    except:
        msg_d = eval(msg_s)
    return msg_d

CodePudding user response:

TCP is a streaming protocol that guarantees delivery of bytes in the order sent, but not with the same send breaks. You need to define a protocol (which you have, as a 64-byte header of message size, then the message data), and then buffer reads until you have a complete message.

Python sockets have a .makefile method that handles the buffering for you, where you can .read(n) a specific number of bytes or .readline() to read a newline-terminated line. With this you can implement the following client and server:

server.py

import socket
import json
import time

s = socket.socket()
s.bind(('',5000))
s.listen()

while True:
    c,a = s.accept()
    print(f'{a} connected')
    # wrap socket in a file-like buffer
    with c, c.makefile('rb') as r: # read binary so .read(n) gets n bytes
        while True:
            header = r.readline() # read header up to a newline
            if not header: break  # if empty string, client closed connection
            size = int(header)
            data = json.loads(r.read(size)) # read exactly "size" bytes and decode JSON
            print(f'{a}: {data}')
    print(f'{a} disconnected')

client.py

import socket
import json

def send_json(conn, msg):
    # smaller data size if non-ASCII used.
    data = json.dumps(msg, ensure_ascii=False).encode()
    msg_length = len(data) # length in encoded bytes
    # send newline-terminated header, then data
    conn.sendall(f'{msg_length}\n'.encode())
    conn.sendall(data)

s = socket.socket()
s.connect(('localhost',5000))
with s:
    send_json(s, {'name':'马克'})  # check to support non-ASCII properly
    send_json(s, [1,2,3])

Start server.py, then run client.py a couple of times:

Output:

('127.0.0.1', 26013) connected
('127.0.0.1', 26013): {'name': '马克'}
('127.0.0.1', 26013): [1, 2, 3]
('127.0.0.1', 26013) disconnected
('127.0.0.1', 26015) connected
('127.0.0.1', 26015): {'name': '马克'}
('127.0.0.1', 26015): [1, 2, 3]
('127.0.0.1', 26015) disconnected
  • Related