Home > front end >  Snakemake: Manually creating a wildcard from a string
Snakemake: Manually creating a wildcard from a string

Time:04-05

Suppose I have two strings, a template and target string.

Template string: "param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"

Target string: "param1~1.1/param2~a_b_c_param3~345"

I would like to create a function that takes in a template string and target string as arguments. The function returns a wildcard object (or dictionary, or some data structure that can hold key-value pairs) where the key is the wildcard name (i.e wildcard1) and the value is the value of the wildcard in the target string (i.e 1.1)

Here's a little example:

def extract_wildcards(template, target):
    ... (implementation) ...
    return wildcards

Usage:

wildcards = extract_wildcards("param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}",
                              "param1~1.1/param2~a_b_c_param3~345")
print(wildcards.wildcard1)
>"1.1"
print(wildcards.wildcard2)
>"a_b_c"
print(wildcards.wildcard3)
>"345"

Note that type casting is not necessary! All the values should strings. Wildcard names in the template will always be surrounded by {}. For the target string, you can assume that there is always only 1 way to extract wildcard values (no ambiguity in terms of what the each wildcard can be)

CodePudding user response:

This works for your example. The idea is to turn the snakemake string into a regex pattern and extract the matches. There might be some weirdness if the filename ends up producing unexpected regex patterns so keep that in mind.

import re

def extract_wildcards(template, target):
    names = re.findall("{[^{]*}", template)
    names = [name.strip("{}") for name in names]
    
    q = re.sub("{[^{]*}", "(.*)", template)
    matches = re.search(q, target)

    wildcards = dict()
    for i,name in enumerate(names):
        wildcards[name] = matches.group(i 1)
    return wildcards

example:

template="param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"
target="param1~1.1/param2~a_b_c_param3~345"
print(extract_wildcards(template, target))
  > {'wildcard1': '1.1', 'wildcard2': 'a_b_c', 'wildcard3': '345'}

I'm not sure what the goal of doing this is but it doesn't feel like a very "snakmake"-esqe solution. Have you considered trying an input function? https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=function#input-functions This type of function gets passed the wildcards automatically

CodePudding user response:

This is what snakemake uses internally:

template = "param1~{wildcard1}/param2~{wildcard2}_param3~{wildcard3}"
target = "param1~1.1/param2~a_b_c_param3~345"

# pip install parse
from parse import parse

wildcards = parse(template, target).named
print(wildcards)
# {'wildcard1': '1.1', 'wildcard2': 'a_b_c', 'wildcard3': '345'}

This should be bundled with snakemake (as an external dependency), in case it's not, one can install parse via pip:

pip install parse
  • Related