Home > Enterprise >  How to send `files_upload_session_append_v2` in parallel? (Dropbox API, Python)
How to send `files_upload_session_append_v2` in parallel? (Dropbox API, Python)

Time:01-05

I want to upload a large file to Dropbox via Dropbox API in a parallel way (so it will be uploaded faster than in a sequential way).

The documentation says

By default, upload sessions require you to send content of the file in
sequential order via consecutive :meth:`files_upload_session_start`,
:meth:`files_upload_session_append_v2`,
:meth:`files_upload_session_finish` calls. For better performance, you
can instead optionally use a ``UploadSessionType.concurrent`` upload
session. To start a new concurrent session, set
``UploadSessionStartArg.session_type`` to
``UploadSessionType.concurrent``. After that, you can send file data in
concurrent :meth:`files_upload_session_append_v2` requests. Finally
finish the session with :meth:`files_upload_session_finish`. There are
couple of constraints with concurrent sessions to make them work. You
can not send data with :meth:`files_upload_session_start` or
:meth:`files_upload_session_finish` call, only with
:meth:`files_upload_session_append_v2` call. Also data uploaded in
:meth:`files_upload_session_append_v2` call must be multiple of 4194304
bytes (except for last :meth:`files_upload_session_append_v2` with
``UploadSessionStartArg.close`` to ``True``, that may contain any
remaining data).

but I'm not sure how to implement this (since there is no files_upload_async_session_append_v2() for example). I can't find any examples in the Internet.

I tried the following code, but there is no upload speedup compare to sequential code

async def upload_file(local_file_path: str, remote_folder_path: str, client: DropboxBase):
    """
        Uploads a file to Dropbox by chunks. This method uses v2 methods of Dropbox API.

        Example:
            upload_file('test.txt', '/Builds/', dropbox_client)

        :param local_file_path:
        :param remote_folder_path: A path to a folder on Dropbox, must end with a slash.
        :param client: Authorized Dropbox client.
        :return:
    """
    with open(local_file_path, 'rb') as file_stream:
        await __upload_file_by_chunks(file_stream, local_file_path, remote_folder_path, client)

async def test(data: bytes, cursor: UploadSessionCursor, client: DropboxBase, close: bool = False):
    client.files_upload_session_append_v2(data, cursor, close=close)


async def __upload_file_by_chunks(file_stream: BinaryIO, local_file_path: str, remote_folder_path: str, client: DropboxBase):
    # As default size for a chunk 4 MB were chosen. I think it's a good compromise between speed and reliability.
    # Also, Dropbox API guide (https://developers.dropbox.com/dbx-performance-guide) says "Consider uploading chunks in multiples of 4 MBs."
    # ATTENTION: The maximum value can be placed here is 150 MB.
    chunk_size_bytes = 4 * 1024 * 1024

    session_id = __start_upload_session(client)
    cursor = __create_upload_session_cursor(file_stream, session_id)
    file_length = path.getsize(local_file_path)

    test_pool = set()

    # TODO: In theory this can be done in parallel, that should speed up the file upload.
    #  Maybe instead of while loop we can precalculate all chunks and then upload them in parallel.
    while file_stream.tell() < file_length:
        if __chunk_size_is_bigger_than_left_data(file_stream.tell(), file_length, chunk_size_bytes):
            chunk_size_bytes = file_length - file_stream.tell()
            test_pool.add(asyncio.create_task(test(file_stream.read(chunk_size_bytes), cursor, client, close=True)))
            continue

        test_pool.add(asyncio.create_task(test(file_stream.read(chunk_size_bytes), cursor, client)))
        cursor = __create_upload_session_cursor(file_stream, session_id)

    await asyncio.wait(test_pool)

    client.files_upload_session_finish(bytes(), cursor, commit=CommitInfo(
        path=__construct_remote_file_path(local_file_path, remote_folder_path)))


def __start_upload_session(client: DropboxBase) -> int:
    session_start_response = client.files_upload_session_start(bytes(), session_type=UploadSessionType.concurrent)
    return session_start_response.session_id

...

asyncio.run(upload_file(file_name, DROPBOX_TEST_FOLDER, client))

CodePudding user response:

Using the "concurrent" mode is a good way to optimize the upload of a file to Dropbox using upload sessions, as it allows you to upload multiple pieces of the file in parallel.

There's a code sample here that shows how to use the "concurrent" mode, including using files_upload_session_append_v2:

https://github.com/dropbox/Developer-Samples/tree/master/Blog/performant_upload

  • Related