I have a rule using files named with a series of numbers from 01
, 02
, 03
... until 12
in their file name, and I would need to format them in 1
, 2
, 3
... 12
for the next step in the analysis.
I am sure there is a way to do this with either f-strings or .format()
, but I am not sure how to do it within one rule where I also specify the number series with a list.
How do I get there?
A minimal example (not working):
numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
starting_folder = "project/temp"
rule rename_files:
input: f"{starting_folder}/file.{{numbers}}.ext"
output: f"{starting_folder}/file.{{{numbers}}:01d}_new.ext"
shell: "ln -s {input} {output}"
E.g. I would like to get project/temp/file.1_new.ext
as the output file path.
CodePudding user response:
The missing steps are:
- use
.lstrip
to specify the desired format of the target files; - request all the target files (the ones without 0s);
- in the rule that requires both versions, start with the reduced integer (without leading zeros) and add the leading zeros to the source (original) files.
numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
starting_folder = "project/temp"
rule all:
input: [f"{starting_folder}/file.{n.lstrip('0')}_new.ext" for n in numbers]
rule rename_files:
input: f"{starting_folder}/file.{{n:02}}.ext"
output: f"{starting_folder}/file.{{n}}_new.ext"
shell: "ln -s {input} {output}"
CodePudding user response:
The way I would approach this problem is by using an input function, and formatting the expected output beforehand
numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
# a dict mapping expected numbers to their old version, assuming they are unique
expected_numbers = {n.lstrip("0") : n for n in numbers}
def expected_file(wildcards):
"""Takes the expected output number and returns the input number"""
old_number = expected_numbers[wildcards.number]
return f"project/temp/file.{old_number}.ext"
rule rename_files:
input: expected_file # using the input function to get the expected file
output: "project/temp/file.{number}_new.ext"
shell: "ln -s {input} {output}"
rule target:
input: expand("project/temp/file.{number}_new.ext", number=expected_numbers)
This is a bit more verbose than SultanOrazbayev's answer, but maybe a tad more explicit ? It also allows to avoid escaping brackets in inputs and outputs or rules, which can be tricky to debug in bigger projects.
It is also using two snakemake features that can be useful to others :
- The expand function : https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function
- The use of an input function : https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function