As far as I understand, it is recommended to use numpy.random.SeedSequence() when running parallel processes with random number generators that are independent of each other. This seems to be easy when using the python multiprocessing library. However, I couldn't come up with a way to use this functionality with GNU parallel. Is there a neat way to ensure independent random number generation running python scripts using GNU parallel?
CodePudding user response:
Your idea in the comment about using a process ID is indeed a valid idea. The problem is that you want enough entropy in your seed - the output of a Pseudo-Random Number Generator is solely defined by its seed. Using the start time as a seed will give different output over time, but often the same output for parallel jobs.
Using only the process ID is also not a good idea, because it's typically a fairly small number. There's often only 16 or 32 bits of data in there. Combining it with the time adds entropy.
Now you mention that the process ID's might be "linearly dependent" - it is common enough on Linux to have incrementing process ID's. This in itself is not a real problem. Any decent PRNG should have be strong enough to handle such seeds.
One notable exception is cryptography. In that case, the independence of the various PRNG's might be a much bigger concern, but we'd need more details to be sure. That's why the common advice is to use existing crypto libraries.