Home > Software engineering >  Apply snakemake rule on all generated files
Apply snakemake rule on all generated files

Time:06-13

I want to run a simple script "script.py", which will run some caculayions and periodically spit out a step_000n.txt file with n being dependent on the total file execution time. I would then like snakemake to run another rule on all generated files. What would be the proper Snakefile input? ie

1. run scipt.py
2. get step_000{1,2,3,4 ..}.txt (n being variable and not determined)
3. apply `process.py -in step_000{n}.txt -out step_000{n}.png` on all step_000{1,2,3,4 ..}.txt

My obviously wrong attempt is below


rule all:
    input: expand("{step}.png", step=list(map(lambda x: x.split(".")[0], glob.glob("model0*.txt"))))

rule txt:
    input: "{step}.txt"
    output: "{step}.png"
    shell:
        "process.py -in {input} -out {output}"

rule first:
    output: "{step}.txt"
    script: "script.py"

I could not figure out how to define output target here.

CodePudding user response:

I would write all the step_000n.txt files to a dedicated directory and then process all the files in that directory. Something like:

rule all:
    input:
        'processed.txt',


rule split:
    output:
        directory('processed_dir'),
    shell:
        r"""
        # Write out step_001.txt, step_002.txt, ..., step_000n.txt
        # in output directory `processed_dir`
        mkdir {output}
        script.py ...
        """


rule process:
    input:
        indir= 'processed_dir',
    output:
        out= 'processed.txt',
    shell:
        r"""
        process.py -n {input.indir}/step_*.txt -out {output.out}
        """
  • Related