I have a json file like so:
{
"foo": {
"bar1":
{"A1": {"name": "A1", "path": "/path/to/A1"},
"B1": {"name": "B1", "path": "/path/to/B1"},
"C1": {"name": "C1", "path": "/path/to/C1"},
"D1": {"name": "D1", "path": "/path/to/D1"}},
"bar2":
{"A2": {"name": "A2", "path": "/path/to/A2"},
"B2": {"name": "B2", "path": "/path/to/B2"},
"C2": {"name": "C2", "path": "/path/to/C2"},
"D2": {"name": "D2", "path": "/path/to/D2"}}}
}
I am trying to run my snakemake pipeline on the samples in sample sets 'bar1' and 'bar2' separately, putting the results into their own folders. When I expand my wildcards I don't want all iterations of sample sets and samples, I just want them in their specific groups, like this:
tmp/bar1/A1.bam
tmp/bar1/B1.bam
tmp/bar1/C1.bam
tmp/bar1/D1.bam
tmp/bar2/A2.bam
tmp/bar2/B2.bam
tmp/bar2/C2.bam
tmp/bar2/D2.bam
Hopefully my snakefile will help explain. I have tried having my snakefile like this:
sample_sets = [ i for i in config['foo'] ]
samples_dict = config['foo'] #cleans it up
def get_samples(wildcards):
return list(samples_dict[wildcards.sample_set].keys())
rule all:
input:
expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = get_samples), sample_set = sample_sets),
This doesn't work, my file names end up with "<function get_samples at 0x7f6e00544320>" in them! I have also tried:
rule all:
input:
expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = list(samples_dict["{{sample_set}}"].keys()), sample_set = sample_sets),
but that get's a KeyError. Have also tried this:
rule all:
input:
[ ["tmp/{{sample_set}}/{sample}.aligned_bam.core.bam".format( sample = sample ) for sample in list(samples_dict[sample_set].keys())] for sample_set in sample_sets ]
which gets an "Wildcards in input files cannot be determined from output files: 'sample_set'" error.
I feel like there must be a simple way of doing this and perhaps I'm being a moron.
Any help would be very much appreciated! And let me know if I've missed some detail.
CodePudding user response:
There is a possibility of using a custom combinatoric function in expand. Most often this function is zip
, however, in your case the nested dictionary shape will require designing a custom function. Instead, a simpler solution is to use Python to construct the list of desired files.
d = {
"foo": {
"bar1": {
"A1": {"name": "A1", "path": "/path/to/A1"},
"B1": {"name": "B1", "path": "/path/to/B1"},
"C1": {"name": "C1", "path": "/path/to/C1"},
"D1": {"name": "D1", "path": "/path/to/D1"},
},
"bar2": {
"A2": {"name": "A2", "path": "/path/to/A2"},
"B2": {"name": "B2", "path": "/path/to/B2"},
"C2": {"name": "C2", "path": "/path/to/C2"},
"D2": {"name": "D2", "path": "/path/to/D2"},
},
}
}
list_files = []
for key in d["foo"]:
for nested_key in d["foo"][key]:
_tmp = f"tmp/{key}/{nested_key}.bam"
list_files.append(_tmp)
print(*list_files, sep="\n")
#tmp/bar1/A1.bam
#tmp/bar1/B1.bam
#tmp/bar1/C1.bam
#tmp/bar1/D1.bam
#tmp/bar2/A2.bam
#tmp/bar2/B2.bam
#tmp/bar2/C2.bam
#tmp/bar2/D2.bam