Home > Mobile >  Splitting rows correctly using multiprocessing package
Splitting rows correctly using multiprocessing package

Time:09-21

I am trying to compute a large dataset and for faster execution I am using multiprocessing package. I am trying to break my rows into smaller array, for example I have 209 rows (x) i.e. my range here.

q = [[i[0],i[-1]] for i in np.array_split(range(x), processors)]

This give me output as:

[[0, 26], [27, 53], [54, 79], [80, 105], [106, 131], [132, 157], [158, 183], [184, 209]]

Expected output:

[[0, 26], [26, 53], [53, 79], [79, 105], [105, 131], [131, 157], [157, 183], [183, 209]]

Note - I just want to change 1st value of sublist ignoring 1st entire sublist i.e. [0,26]. Additionally, I don't want to manipulate multiprocessing package because it is splitting number of rows correctly as row values keep changing.

CodePudding user response:

The following code should do the trick:

import numpy as np
x = 210
processors = 8

q = [[i[0],i[-1]] if idx==0 else [i[0]-1,i[-1]] for idx, i in enumerate(np.array_split(range(x), processors))]
print(q)

That said, it seems you are trying to grab the start and end indices (I guess). Why not just getting the result of np.array_split directly. You feed it an array, it splits it in chuncks that you feed to the processes.

  • Related