Home > OS >  Cannot send large binary files over Python socket
Cannot send large binary files over Python socket

Time:04-17

I have created server.py and client.py with the intention of sending both text and binary files between the two. My code works for both small text and small binary files, however large binary files do not work.

In my testing, I use a 1.5 KB .ZIP file and I can send this without any problem. However, when I try sending a 44 MB .ZIP file, I am running into an issue.

My client code works as follows:

  1. The client creates a dictionary containing metadata about the file to be sent.
  2. The binary file is base64 encoded and is added as a value to the "filecontent" key of the dictionary.
  3. The dictionary is JSON serialised.
  4. The length of the serialised dictionary is calculated and fixed-length prefixed to the serialised dictionary.
  5. The client sends the entire message to the server.

On the server:

  1. The server receives the fixed-length header and interprets the size of the message in the transmission.
  2. The server reads the message in chunks of MAXSIZE (for testing set to 500), storing them temporarily.
  3. Once the entire message is received, the server joins the entire message.
  4. The server base64 decodes the value belonging to the "filecontent" key.
  5. Next, it writes the content of the file to disk.

As I said, this works fine for my 1.5 KB .ZIP file, but for the 44 MB .ZIP file it breaks in step 3 on the server. The error is thrown by the json.decoder. It complains about "Unterminated string starting at..."

While troubleshooting, I found that the last part of the message did not arrive. This explains the complaint from the json.decoder. I also found that the client sends 61841613 as the fixed length header, where it should be 62279500. A difference of 437887.

When I do not let the client calculate the size of the message, but simply hardcode the size as 62279500, then everything works as expected. That leads me to believe there is something wrong with the way the client calculates the message size for larger files. However I cannot work out what's wrong.

Here are the relevant parts of the code:

# client.py

connected = True
while connected:
    # Actual dictionary contains more metadata
    msg = { "filename" : "test.zip" , "author" : "marc" , "filecontent" : "" }

    myfile = open("test.zip", "rb")
    encoded = base64.b64encode(myfile.read())
    msg["filecontent"] = encoded.decode("ascii")

    msg = json.dumps(msg)
    header = "{:<10}".format(len(msg))
    header_msg = header   msg

    client.sendall(header_msg.encode("utf-8"))
# server.py

HEADER = 10
MAXSIZE = 500

connected = True
while connected:
    msg = conn.recv(HEADER).decode("utf-8")
    SIZE = int(msg)

    totalmsg = []
    while SIZE > 0:
        if SIZE > MAXSIZE:
            msg = conn.recv(MAXSIZE).decode("utf-8")
            totalmsg.append(msg)
            SIZE = SIZE - MAXSIZE
        else:
            msg = conn.recv(SIZE).decode("utf-8")
            totalmsg.append(msg)
            SIZE = 0

    msg = json.loads("".join(totalmsg))
    decoded = base64.b64decode(msg["filecontent"])

    myfile = open(msg["filename"], "wb")
    myfile.write(decoded)
    myfile.close()

CodePudding user response:

As mentioned in the comments conn.recv(MAXSIZE) receives at most MAXSIZE but can return less. The code assumes it always receives the amount requested. There is also no reason to base64-encode the file data; it just makes the file data much larger. Sockets are a byte stream, so just send the bytes.

The header can be delineated by a marker between it and the data. Below I've used CRLF and written the header as a single JSON line and also demonstrate sending a couple of files on the same connection:

client.py

import socket
import json

def transmit(sock, filename, author, content):
    msg = {'filename': filename, 'author': author, 'length': len(content)}
    data = json.dumps(msg, ensure_ascii=False).encode()   b'\r\n'   content
    sock.sendall(data)

client = socket.socket()
client.connect(('localhost',5000))
with client:
    with open('test.zip','rb') as f:
        content = f.read()
    transmit(client, 'test.zip', 'marc', content)
    content = b'The quick brown fox jumped over the lazy dog.'
    transmit(client, 'mini.txt', 'Mark', content)

server.py

import socket
import json
import os

os.makedirs('Downloads', exist_ok=True)

s = socket.socket()
s.bind(('',5000))
s.listen()

while True:
    c, a = s.accept()
    print('connected:', a)
    r = c.makefile('rb')   # wrap socket in a file-like object
    with c, r:
        while True:
            header_line = r.readline() # read in a full line of data
            if not header_line: break
            header = json.loads(header_line) # process the header
            print(header)
            remaining = header['length']
            with open(os.path.join('Downloads',header['filename']), 'wb') as f:
                while remaining :
                    # Unlike socket.recv() the makefile object won't return less
                    # than requested unless the socket is closed.
                    count = f.write(r.read(min(10240, remaining)))
                    if not count:  # socket closed?
                        if remaining:
                            print('Unsuccessful')
                        break
                    remaining -= count
                else:
                    print('Success')
    print('disconnected:', a)

Output:

connected: ('127.0.0.1', 14117)
{'filename': 'test.zip', 'author': 'marc', 'length': 52474063}
Success
{'filename': 'mini.txt', 'author': 'Mark', 'length': 45}
Success
disconnected: ('127.0.0.1', 14117)
  • Related