Home > database >  How to use a wildcard within expand function parameters in snakemake?
How to use a wildcard within expand function parameters in snakemake?

Time:02-23

I have a json file like so:

{
    "foo": {
        "bar1": 
            {"A1": {"name": "A1", "path": "/path/to/A1"}, 
             "B1": {"name": "B1", "path": "/path/to/B1"},
             "C1": {"name": "C1", "path": "/path/to/C1"},
             "D1": {"name": "D1", "path": "/path/to/D1"}},
        "bar2": 
            {"A2": {"name": "A2", "path": "/path/to/A2"}, 
             "B2": {"name": "B2", "path": "/path/to/B2"},
             "C2": {"name": "C2", "path": "/path/to/C2"},
             "D2": {"name": "D2", "path": "/path/to/D2"}}}
}

I am trying to run my snakemake pipeline on the samples in sample sets 'bar1' and 'bar2' separately, putting the results into their own folders. When I expand my wildcards I don't want all iterations of sample sets and samples, I just want them in their specific groups, like this:

tmp/bar1/A1.bam
tmp/bar1/B1.bam
tmp/bar1/C1.bam
tmp/bar1/D1.bam
tmp/bar2/A2.bam
tmp/bar2/B2.bam
tmp/bar2/C2.bam
tmp/bar2/D2.bam

Hopefully my snakefile will help explain. I have tried having my snakefile like this:

sample_sets = [ i for i in config['foo'] ]

samples_dict = config['foo'] #cleans it up

def get_samples(wildcards):
    return list(samples_dict[wildcards.sample_set].keys())

rule all:
    input:
        expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = get_samples), sample_set = sample_sets),

This doesn't work, my file names end up with "<function get_samples at 0x7f6e00544320>" in them! I have also tried:

rule all:
    input:
        expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = list(samples_dict["{{sample_set}}"].keys()), sample_set = sample_sets),

but that get's a KeyError. Have also tried this:

rule all:
    input:
        [ ["tmp/{{sample_set}}/{sample}.aligned_bam.core.bam".format( sample = sample ) for sample in list(samples_dict[sample_set].keys())] for sample_set in sample_sets ]

which gets an "Wildcards in input files cannot be determined from output files: 'sample_set'" error.

I feel like there must be a simple way of doing this and perhaps I'm being a moron.

Any help would be very much appreciated! And let me know if I've missed some detail.

CodePudding user response:

There is a possibility of using a custom combinatoric function in expand. Most often this function is zip, however, in your case the nested dictionary shape will require designing a custom function. Instead, a simpler solution is to use Python to construct the list of desired files.

d = {
    "foo": {
        "bar1": {
            "A1": {"name": "A1", "path": "/path/to/A1"},
            "B1": {"name": "B1", "path": "/path/to/B1"},
            "C1": {"name": "C1", "path": "/path/to/C1"},
            "D1": {"name": "D1", "path": "/path/to/D1"},
        },
        "bar2": {
            "A2": {"name": "A2", "path": "/path/to/A2"},
            "B2": {"name": "B2", "path": "/path/to/B2"},
            "C2": {"name": "C2", "path": "/path/to/C2"},
            "D2": {"name": "D2", "path": "/path/to/D2"},
        },
    }
}

list_files = []

for key in d["foo"]:
    for nested_key in d["foo"][key]:
        _tmp = f"tmp/{key}/{nested_key}.bam"
        list_files.append(_tmp)

print(*list_files, sep="\n")
#tmp/bar1/A1.bam
#tmp/bar1/B1.bam
#tmp/bar1/C1.bam
#tmp/bar1/D1.bam
#tmp/bar2/A2.bam
#tmp/bar2/B2.bam
#tmp/bar2/C2.bam
#tmp/bar2/D2.bam
  • Related