Suppose I have two strings, a template and target string.
Template string: "param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"
Target string: "param1~1.1/param2~a_b_c_param3~345"
I would like to create a function that takes in a template string and target string as arguments. The function returns a wildcard object (or dictionary, or some data structure that can hold key-value pairs) where the key is the wildcard name (i.e wildcard1) and the value is the value of the wildcard in the target string (i.e 1.1)
Here's a little example:
def extract_wildcards(template, target):
... (implementation) ...
return wildcards
Usage:
wildcards = extract_wildcards("param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}",
"param1~1.1/param2~a_b_c_param3~345")
print(wildcards.wildcard1)
>"1.1"
print(wildcards.wildcard2)
>"a_b_c"
print(wildcards.wildcard3)
>"345"
Note that type casting is not necessary! All the values should strings. Wildcard names in the template will always be surrounded by {}
. For the target string, you can assume that there is always only 1 way to extract wildcard values (no ambiguity in terms of what the each wildcard can be)
CodePudding user response:
This works for your example. The idea is to turn the snakemake string into a regex pattern and extract the matches. There might be some weirdness if the filename ends up producing unexpected regex patterns so keep that in mind.
import re
def extract_wildcards(template, target):
names = re.findall("{[^{]*}", template)
names = [name.strip("{}") for name in names]
q = re.sub("{[^{]*}", "(.*)", template)
matches = re.search(q, target)
wildcards = dict()
for i,name in enumerate(names):
wildcards[name] = matches.group(i 1)
return wildcards
example:
template="param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"
target="param1~1.1/param2~a_b_c_param3~345"
print(extract_wildcards(template, target))
> {'wildcard1': '1.1', 'wildcard2': 'a_b_c', 'wildcard3': '345'}
I'm not sure what the goal of doing this is but it doesn't feel like a very "snakmake"-esqe solution. Have you considered trying an input function? https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=function#input-functions This type of function gets passed the wildcards automatically
CodePudding user response:
This is what snakemake
uses internally:
template = "param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"
target = "param1~1.1/param2~a_b_c_param3~345"
# pip install parse
from parse import parse
wildcards = parse(template, target).named
print(wildcards)
# {'wildcard1': '1.1', 'wildcard2': 'a_b_c', 'wildcard3': '345'}
This should be bundled with snakemake
(as an external dependency), in case it's not, one can install parse via pip
:
pip install parse