I want to run a simple script "script.py", which will run some caculayions and periodically spit out a step_000n.txt file with n being dependent on the total file execution time. I would then like snakemake to run another rule on all generated files. What would be the proper Snakefile input? ie
1. run scipt.py
2. get step_000{1,2,3,4 ..}.txt (n being variable and not determined)
3. apply `process.py -in step_000{n}.txt -out step_000{n}.png` on all step_000{1,2,3,4 ..}.txt
My obviously wrong attempt is below
rule all:
input: expand("{step}.png", step=list(map(lambda x: x.split(".")[0], glob.glob("model0*.txt"))))
rule txt:
input: "{step}.txt"
output: "{step}.png"
shell:
"process.py -in {input} -out {output}"
rule first:
output: "{step}.txt"
script: "script.py"
I could not figure out how to define output target here.
CodePudding user response:
I would write all the step_000n.txt
files to a dedicated directory and then process all the files in that directory. Something like:
rule all:
input:
'processed.txt',
rule split:
output:
directory('processed_dir'),
shell:
r"""
# Write out step_001.txt, step_002.txt, ..., step_000n.txt
# in output directory `processed_dir`
mkdir {output}
script.py ...
"""
rule process:
input:
indir= 'processed_dir',
output:
out= 'processed.txt',
shell:
r"""
process.py -n {input.indir}/step_*.txt -out {output.out}
"""