I have there weird Snakemake rule I want to create that the Snakemake documentation doesn't describe. I have rule picard_dupes that produces three outputs, metrics, bams, and bai files. However unlike bams and metrics that are created through Output, bai is created in with CREATE_INDEX that isn't in the output. The creates a .bai not a bam.bai file. in rule picard_bai renames this file to be recognized in later rules. However, I am getting an error: Missing input files for rule picard_bai. This is telling my Snakemake is trying to execute picard_bai before picard_dup. How can I get this to work where my bai is output in not in the output to move to next rule for input?
rule picard_dupes:
input: rules.star_aligner.output.bam
output:
bam = 'picard/{sampleID}_marked_duplicates.bam',
metric = 'picard/{sampleID}_marked_dup_metrics.txt'
threads: 12
run:
shell('picard MarkDuplicates \
-Xmx4G \
-XX:ParallelGCThreads={threads} \
I={input} \
O={output.bam} \
M={output.metric} \
CREATE_INDEX=true ASSUME_SORT_ORDER=coordinate OPTICAL_DUPLICATE_PIXEL_DISTANCE=100')
rule picard_bai:
input: 'picard/{sampleID}_marked_duplicates.bai'
output:
bai = 'picard/{sampleID}_marked_duplicates.bam.bai'
run:
shell('mv {input} {output.bai}')
CodePudding user response:
Snakemake only knows what you tell it. In this case, you aren't letting snakemake know that picard_dupes will also create the bai file. I like to call these implicit outputs because you don't explicitly create them in your shell command. Simply change your output directive to:
output:
bam = 'picard/{sampleID}_marked_duplicates.bam',
metric = 'picard/{sampleID}_marked_dup_metrics.txt',
index = 'picard/{sampleID}_marked_duplicates.bai',
As a minor enhancement, you can skip the picard_bai altogether and perform the rename in picard_dupes"
rule picard_dupes:
input: rules.star_aligner.output.bam
output:
bam = 'picard/{sampleID}_marked_duplicates.bam',
metric = 'picard/{sampleID}_marked_dup_metrics.txt',
index = 'picard/{sampleID}_marked_duplicates.bai',
threads: 12
run:
shell('picard MarkDuplicates \
-Xmx4G \
-XX:ParallelGCThreads={threads} \
I={input} \
O={output.bam} \
M={output.metric} \
CREATE_INDEX=true ASSUME_SORT_ORDER=coordinate OPTICAL_DUPLICATE_PIXEL_DISTANCE=100')
bai_file = output.bam[:-1] 'i' # generate bai filename
shell('mv {bai_file} {output.index}')