Home > Software engineering >  Non traditionally Snakemake output for next rules input
Non traditionally Snakemake output for next rules input

Time:05-02

I have there weird Snakemake rule I want to create that the Snakemake documentation doesn't describe. I have rule picard_dupes that produces three outputs, metrics, bams, and bai files. However unlike bams and metrics that are created through Output, bai is created in with CREATE_INDEX that isn't in the output. The creates a .bai not a bam.bai file. in rule picard_bai renames this file to be recognized in later rules. However, I am getting an error: Missing input files for rule picard_bai. This is telling my Snakemake is trying to execute picard_bai before picard_dup. How can I get this to work where my bai is output in not in the output to move to next rule for input?

rule picard_dupes:
    input: rules.star_aligner.output.bam
    output:
        bam = 'picard/{sampleID}_marked_duplicates.bam',
        metric = 'picard/{sampleID}_marked_dup_metrics.txt'
    threads: 12
    run:
        shell('picard MarkDuplicates \
            -Xmx4G \
            -XX:ParallelGCThreads={threads} \
            I={input} \
            O={output.bam} \
            M={output.metric} \
            CREATE_INDEX=true ASSUME_SORT_ORDER=coordinate OPTICAL_DUPLICATE_PIXEL_DISTANCE=100')

rule picard_bai:
    input: 'picard/{sampleID}_marked_duplicates.bai'
    output:
        bai = 'picard/{sampleID}_marked_duplicates.bam.bai'
    run:
        shell('mv {input} {output.bai}')

CodePudding user response:

Snakemake only knows what you tell it. In this case, you aren't letting snakemake know that picard_dupes will also create the bai file. I like to call these implicit outputs because you don't explicitly create them in your shell command. Simply change your output directive to:

    output:
        bam = 'picard/{sampleID}_marked_duplicates.bam',
        metric = 'picard/{sampleID}_marked_dup_metrics.txt',
        index = 'picard/{sampleID}_marked_duplicates.bai',

As a minor enhancement, you can skip the picard_bai altogether and perform the rename in picard_dupes"

rule picard_dupes:
    input: rules.star_aligner.output.bam
    output:
        bam = 'picard/{sampleID}_marked_duplicates.bam',
        metric = 'picard/{sampleID}_marked_dup_metrics.txt',
        index = 'picard/{sampleID}_marked_duplicates.bai',
    threads: 12
    run:
        shell('picard MarkDuplicates \
            -Xmx4G \
            -XX:ParallelGCThreads={threads} \
            I={input} \
            O={output.bam} \
            M={output.metric} \
            CREATE_INDEX=true ASSUME_SORT_ORDER=coordinate OPTICAL_DUPLICATE_PIXEL_DISTANCE=100')
        bai_file = output.bam[:-1]   'i'  # generate bai filename
        shell('mv {bai_file} {output.index}')
  • Related