Home > Enterprise >  Enabling/disabling binary flags in GNU Parallel
Enabling/disabling binary flags in GNU Parallel

Time:11-16

I am trying to use GNU Parallel to run a script that has multiple binary flags. I would like to enable/disable these as follows:

Given a script named "sample.py", with two options, "--seed" which takes an integer and "--something" which is a binary flag and takes no input, I would like to construct a call to parallel that produces the following calls:

python sample.py --seed 1111
python sample.py --seed 1111 --something
python sample.py --seed 2222
python sample.py --seed 2222 --something
python sample.py --seed 3333
python sample.py --seed 3333 --something

I've tried things like

parallel python sample.py --seed {1} {2} ::: 1111 2222 3333 ::: "" --something
parallel python sample.py --seed {1} {2} ::: 1111 2222 3333 ::: '' --something
parallel python sample.py --seed {1} {2} ::: 1111 2222 3333 ::: \  --something

but haven't had any luck. Is what I'm trying to achieve possible with GNU parallel? I can modify my script to take explicit TRUE/FALSE values for the flag but I'd prefer to avoid that if possible.

CodePudding user response:

> bash$ cat sample.py 
#!/usr/bin/python3

import sys
import time

time.sleep(0.2)
print(sys.argv)
> bash$ cat split.sh 
#!/bin/sh

exec $*
> bash$ for seed in 1111 2222 3333; do \
    printf "%s\0" "$seed" "$seed --something"; \
done \
| xargs -0 parallel \
    sh split.sh python3 sample.py --
['sample.py', '1111', '--something']
['sample.py', '1111']
['sample.py', '2222']
['sample.py', '2222', '--something']
['sample.py', '3333', '--something']
['sample.py', '3333']

Explanation:

The first part, the loop, just creates the list of arguments:

1111
1111 --something
2222
2222 --something
3333
3333 --something

xargs will send that list from stdin to parallel's arguments.

split.sh splits its arguments by whitespace - here we're assuming that the arguments to your script don't have whitespace in them.
So we call sh split.sh which will basically execute the command by splitting arguments like 2222 --something to 2222 and --something.
Those arguments will be passed to python3 sample.py, so you get a shell command like python3 sample.py 2222 --something that is run by parallel.

If we didn't use split.sh and just called python directly (xargs -0 parallel python3 sample.py --), then when xargs passes 2222 --something as a single argument, parallel would have ran something like python3 sample.py '2222 --something'.

CodePudding user response:

You are so close.

GNU Parallel quotes replacement strings. That usually makes sense, because it is then safe to give it filenames like:

My brother's 12" records, all with ***.csv

which could otherwise give no end of troubles.

However, to be consistent GNU Parallel also quotes the empty string. And that is what is hitting you here.

--dry-run shows what is going on:

$ parallel --dry-run python sample.py --seed {1} {2} ::: 1111 2222 3333 ::: '' --something
python sample.py --seed 1111 ''
python sample.py --seed 1111 --something
python sample.py --seed 2222 ''
python sample.py --seed 2222 --something
python sample.py --seed 3333 ''
python sample.py --seed 3333 --something

So how can you avoid that?

You can tell the shell to evaluate all strings:

parallel eval python sample.py --seed {1} {2} ::: 1111 2222 3333 ::: '' --something

but that might be a bit of a blunt hammer when you need a scalpel. From version 20190722 you can also use {=uq=}. uq() is a perl function which tells GNU Parallel that this replacement string should not be quoted:

$ parallel-20190722 --dry-run python sample.py --seed {1} {=2 uq=} ::: 1111 2222 3333 ::: '' --something
python sample.py --seed 1111 
python sample.py --seed 1111 --something
python sample.py --seed 2222 
python sample.py --seed 2222 --something
python sample.py --seed 3333 
python sample.py --seed 3333 --something
  • Related