Home > front end >  snakemake module from github changes targets?
snakemake module from github changes targets?

Time:10-07

hope you can help me solve my issue or tell me to submit report

I'm 'importing' a snakemake module from github in another snakefile, which is local. This appears to mess up the target of the local snakefile. When the 2nd snakefile is imported the target is no longer the one specified by rule 'all', but by some arbitrary (?) rule in the imported snakefile, even when the imported snakefile does not contain any relevant rules.

I've compiled an example set of two repo's on github which suffer from this problem (lpagie/repo1 and lpagie/repo2). From the repo1/readme.md:

==============

This repo is setup to illustrate a problem (?) with using snakemake modules from github

Clone this repo locally and run snakemake from a directory above the cloned repo, using the wrapper run.sh

This snakefile will 'import' lpagie/repo2, which in current form only contains outcommented rules and a rule which is (supposedly) not meaningful for repo1.
Running the snakemake of repo1 will not generate the output specified by rule 'all' (output/final) but instead output generated by rule 'non-sense' ....

When the import of the repo2 module is outcommented from repo1/snakefile_1.smk, running the snakemake generates the expected outcome.

=============

Am I overlooking something obvious?

I'm using snakemake V 6.9.1, installed in conda
Here's the output I get running a clean install of repo1 and running the 'repo1/run.sh':

git clone [email protected]:lpagie/repo1.git
git clone [email protected]:lpagie/repo2.git

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job             count    min threads    max threads
------------  -------  -------------  -------------
wf2_nonsense        1              1              1
total               1              1              1

Select jobs to execute...

[Wed Oct  6 17:04:44 2021]
rule wf2_nonsense:
    input: /tmp/tmph_6w4l9asnakemake-runtime-source-cache/bfcfa05f3052febb0b88b59991e4aac562b3465cfdb8f8d288a357884ae7572b
    output: output/nonsense.out
    jobid: 0
    reason: Missing output files: output/nonsense.out
    resources: tmpdir=/tmp

/data/home/ludo/miniconda3/bin/python3.8 /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/scripts/tmpu8huybi8.touch.py
repo2
/data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos
[Wed Oct  6 17:04:45 2021]
Finished job 0.
1 of 1 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170442.797027.snakemake.log

Same after out commenting the lines importing the repos2 module:

vi repo1/snakefile_1.smk

bash repo1/run.sh 
repo_dir = /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1
Building DAG of jobs...                                                                                                                 
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.                                                                                        
Job stats:                
job      count    min threads    max threads
-----  -------  -------------  -------------
A            1              1              1
B            1              1              1
all          1              1              1
total        3              1              1
                                  
Select jobs to execute...
                                                                    
[Wed Oct  6 17:08:18 2021]
rule B:
    output: output/fB     
    jobid: 2   
    reason: Missing output files: output/fB
    resources: tmpdir=/tmp                                                                                                                                                
bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/fB
[Wed Oct  6 17:08:18 2021]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
rule A:
    input: output/fB
    output: output/final
    jobid: 1
    reason: Missing output files: output/final; Input files updated by another job: output/fB
    resources: tmpdir=/tmp

bash /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/repo1/scripts/touch.sh output/final
[Wed Oct  6 17:08:18 2021]
Finished job 1.
2 of 3 steps (67%) done
Select jobs to execute...

[Wed Oct  6 17:08:18 2021]
localrule all:
    input: output/final
    jobid: 0
    reason: Input files updated by another job: output/final
    resources: tmpdir=/tmp

[Wed Oct  6 17:08:18 2021]
Finished job 0.
3 of 3 steps (100%) done
Complete log: /data/home/ludo/projects/20211005_test_snakemake_submodules/test_repos/.snakemake/log/2021-10-06T170818.178572.snakemake.l
og

I created lpagie/repo3 which is the copy of repo1 but out commented lines which otherwise import the repo2 module.

CodePudding user response:

Your code to import rules from the remote module comes before rule all. Therefore whichever rule is imported first determines the final output of the pipeline.

So just put the import after rule all. Instead of this:

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

use rule * from other_workflow as wf2_*

rule all:
  input:
    "output/final"

Try:

module other_workflow:
  snakefile: github("lpagie/repo2", path="snakefile_2.smk", commit="61f60f7")
  config: config

rule all:
  input:
    "output/final"

use rule * from other_workflow as wf2_*

(As an aside, it appears the github function was a recent addition and this will work with snakemake >=6.9)

  • Related