How to split list in chunks in equally in python?-CodePudding

i am trying to add users to my group using pyrogram i have 200 user ids in a list - python

list_of_users = [user_id1, user_id2, user_id3, user_id4, ...]

i also, have a list of 7 clients, what i waana do is distribute, no of list of user ids among 7 clients (approx. equally) and add them, also i sometimes have uneven number of users so how do i distribute the list add users accordingly using python?

btw : its okay if 2-3 users are not properly distributed, like i wanna distribute approx. and add them but none of the users should miss.

i tried this function -

def divide_chunks(l, n):
    for i in range(0, len(l), n): 
        yield l[i:i   n]

but it doesn't distribute evenly it distributes specific number of chuncks and at last gives remaining chunks which is not what i want.

inshort : i want the output to be autodecided and decide how to evenly distribute the user ids.

most of answer in stackover flow we have to decide no of chunks i don't wanna - all i want to do is distribute the x no of items into y no of equal parts

CodePudding user response：

You can use:

np.array_split(list_of_users, NUMBER_OF_CLIENTS)

DIY: Without external libraries

Here is one approach without external libraries. This implementation will assign an equal number of users to each client if possible. If not it will make sure the difference in number of users assigned to clients between clients is at max 1 (= my definition of fair). Additionally, it will make sure that additional users are not assigned to the same clients all the time, if you were to run this multiple times. It does this by randomly choosing the set of clients which will need to take on one of the remaining users (that could not be assigned to clients in equal parts). This ensures a fair allocation of users to clients.

It's a bit more code that I post, so here some high-level explanation:

The relevant function is called assign_users_to_clients(). This will do the job you intend to do. The two other functions verify_all_users_assigned() and print_mapping() are just utility functions for the sake of this demo. One will make sure the assignment is correct, i. e. users are assigned to exactly one client (no duplicate assignments, no unassigned users) and the other just prints the result a bit nicer so you can verify that the distribution of users to clients is actually fair.

import random


def verify_all_users_assigned(users, client_user_dict):
    """
    Verify that all users have indeed been assigned to a client.
    Not necessary for the algorithm but used to check whether the implementation is correct.
    :param users: list of all users that have to be assigned
    :param client_user_dict: assignment of users to clients
    :return:
    """
    users_assigned_to_clients = set()
    duplicate_users = list()

    for clients_for_users in client_user_dict.values():
        client_set = set(clients_for_users)
        # if there is an intersection those users have been assigned twice (at least)
        inter = users_assigned_to_clients.intersection(client_set)
        if len(inter) != 0:
            duplicate_users.extend(list(inter))
        # now make union of clients to know which clients have already been processed
        users_assigned_to_clients = users_assigned_to_clients.union(client_set)
    all_users = set(users)
    remaining_users = users_assigned_to_clients.difference(all_users)
    if len(remaining_users) != 0:
        print(f"Not all users have been assigned to clients. Missing are {remaining_users}")
        return
    if len(duplicate_users) != 0:
        print(f"Some users have been assigned at least twice. Those are {duplicate_users}")
        return
    print(f"All users have successfully been assigned to clients.")


def assign_users_to_clients(users, clients):
    """
    Assign users to clients.
    :param users: list of users
    :param clients: list of clients
    :return: dictionary with mapping from clients to users
    """
    users_per_client = len(users) // len(clients)
    remaining_clients = len(users) % len(clients)
    if remaining_clients != 0:
        print(
            f"An equal split is not possible! {remaining_clients} users would remain when each client takes on {users_per_client} users. Assigning remaining users to random clients.")

    # assign each client his fair share of users
    client_users = list()
    for i in range(0, len(users), users_per_client):
        # list of all clients for one user
        user_for_client = list()
        last_client = i   users_per_client
        # make sure we don't run out of bounds here
        if last_client > len(users):
            last_client = len(users)
        # run from current position (as determined by range()) to last client (as determined by the step value)
        # this will assign all users (that belong to the client's share of users) to one client
        for j in range(i, last_client):
            # assign user to client
            user_for_client.append(users[j])
        client_users.append(user_for_client)

    # Assign clients and users as determined above
    client_user_registry = {clients[i]: client_users[i] for i in range(len(clients))}
    # now we need to take care of the remaining clients
    # we could just go from back to front and assign one more user to each client but to make it fair, choose randomly without repetition
    start = users_per_client * len(clients)
    for i, client in enumerate(random.sample(clients, k=remaining_clients)):
        client_user_registry[client].append(users[start   i])
    return client_user_registry


def print_mapping(mapping):
    print("""
 -------------------------
| Mapping: User -> Client
 -------------------------""")
    for client, users in mapping.items():
        print(f" - Client: {client}\t =>\t Users ({len(users)}): {', '.join(users)}")


# users that need to be assigned
list_of_users = ["user_id1", "user_id2", "user_id3", "user_id4", "user_id5", "user_id6", "user_id7", "user_id8",
                 "user_id9", "user_id10", "user_id11",
                 "user_id12", "user_id13", "user_id14", "user_id15", "user_id16", "user_id17", "user_id18",
                 "user_id19",
                 "user_id20", "user_id21", "user_id22", "user_id23", "user_id24", "user_id25", "user_id26"]
# clients to assign users to
list_of_clients = ["client_1", "client_2", "client_3", "client_4", "client_5", "client_6", "client_7"]

# do assignment of users to clients
client_user_assignment = assign_users_to_clients(list_of_users, list_of_clients)

# verify that the algorithm works (just for demo purposes)
verify_all_users_assigned(list_of_users, client_user_assignment)

# print assignment
print_mapping(client_user_assignment)

Expected output

An equal split is not possible! 5 users would remain when each client takes on 3 users. Assigning remaining users to random clients.
All users have successfully been assigned to clients.

 -------------------------
| Mapping: User -> Client
 -------------------------
 - Client: client_1  =>  Users (4): user_id1, user_id2, user_id3, user_id23
 - Client: client_2  =>  Users (4): user_id4, user_id5, user_id6, user_id26
 - Client: client_3  =>  Users (3): user_id7, user_id8, user_id9
 - Client: client_4  =>  Users (3): user_id10, user_id11, user_id12
 - Client: client_5  =>  Users (4): user_id13, user_id14, user_id15, user_id24
 - Client: client_6  =>  Users (4): user_id16, user_id17, user_id18, user_id25
 - Client: client_7  =>  Users (4): user_id19, user_id20, user_id21, user_id22

Please note: as random.sample() chooses the clients that take on one more client randomly your result might differ, but it will always be fair (= see specification of fair above)

With external libraries

When using external libraries there are many options. See e.g. function pandas.cut() or numpy.split(). They will act differently when a fair distribution of users to clients is not possible so you should read on that in the documentation.