Home > Blockchain >  Python multiprocessing manager showing error when used in flask API
Python multiprocessing manager showing error when used in flask API

Time:10-22

I am pretty confused about the best way to do what I am trying to do.

What do I want?

  1. API call to the flask application
  2. Flask route starts 4-5 multiprocess using Process module and combine results(on a sliced pandas dataframe) using a shared Managers().list()
  3. Return computed results back to the client.

My implementation:

pos_iter_list = get_chunking_iter_list(len(position_records), 10000)

manager = Manager()
data_dict = manager.list()
processes = []
for i in range(len(pos_iter_list) - 1):
    temp_list = data_dict[pos_iter_list[i]:pos_iter_list[i   1]]
    p = Process(
        target=transpose_dataset,
        args=(temp_list, name_space, align_namespace, measure_master_id, df_searchable, products,
              channels, all_cols, potential_col, adoption_col, final_segment, col_map, product_segments,
              data_dict)
    )
    p.start()
    processes.append(p)
for p in processes:
    p.join()

My directory structure:

- main.py(flask entry point)
- helper.py(contains function where above code is executed & calls transpose_dataset function)

Error that i am getting while running the same? RuntimeError: No root path can be found for the provided module "mp_main". This can happen because the module came from an import hook that does not provide file name information or because it's a namespace package. In this case the root path needs to be explicitly provided.

Not sure what went wong here, manager list works fine when called from a sample.py file using if __name__ == '__main__':

Update: The same piece of code is working fine on my MacBook and not on windows os.

A sample flask API call:

@app.route(PREFIX   "ping", methods=['GET'])
def ping():
    man = mp.Manager()
    data = man.list()
    processes = []
    for i in range(0,5):
        pr = mp.Process(target=test_func, args=(data, i))
        pr.start()
        processes.append(pr)

    for pr in processes:
        pr.join()

    return json.dumps(list(data))

CodePudding user response:

Stack has an ongoing bug preventing me from commenting, so I'll just write up an answer..

Python has 2 (main) ways to start a new process: "spawn", and "fork". Fork is a system command only available in *nix (read: linux or macos), and therefore spawn is the only option in windows. After 3.8 spawn will be the default on MacOS, but fork is still available. The big difference is that fork basically makes a copy of the existing process while spawn starts a whole new process (like just opening a new cmd window). There's a lot of nuance to why and how, but in order to be able to run the function you want the child process to run using spawn, the child has to import the main file. Importing a file is tantamount to just executing that file and then typically binding it's namespace to a variable: import flask will run the flask/__ini__.py file, and bind it's global namespace to the variable flask. There's often code however that is only used by the main process, and doesn't need to be imported / executed in the child process. In some cases running that code again actually breaks things, so instead you need to prevent it from running outside of the main process. This is taken into account in that the "magic" variable __name__ is only equal to "__main__" in the main file (and not in child processes or when importing modules).

In your specific case, you're creating a new app = Flask(__name__), which does some amount of validation and checks before you ever run the server. It's one of these setup/validation steps that it's tripping over when run from the child process. Fixing it by not letting it run at all is imao the cleaner solution, but you can also fix it by giving it a value that it won't trip over, then just never start that secondary server (again by protecting it with if __name__ == "__main__":)

  • Related