Home > Enterprise >  Format number within a path name
Format number within a path name

Time:10-23

I have a rule using files named with a series of numbers from 01, 02, 03 ... until 12 in their file name, and I would need to format them in 1, 2, 3... 12 for the next step in the analysis.

I am sure there is a way to do this with either f-strings or .format(), but I am not sure how to do it within one rule where I also specify the number series with a list.

How do I get there?

A minimal example (not working):

numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]

starting_folder = "project/temp"

rule rename_files:
    input: f"{starting_folder}/file.{{numbers}}.ext"
    output: f"{starting_folder}/file.{{{numbers}}:01d}_new.ext"
    shell: "ln -s {input} {output}"

E.g. I would like to get project/temp/file.1_new.ext as the output file path.

CodePudding user response:

The missing steps are:

  • use .lstrip to specify the desired format of the target files;
  • request all the target files (the ones without 0s);
  • in the rule that requires both versions, start with the reduced integer (without leading zeros) and add the leading zeros to the source (original) files.
numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]

starting_folder = "project/temp"

rule all:
    input: [f"{starting_folder}/file.{n.lstrip('0')}_new.ext" for n in numbers]

rule rename_files:
    input: f"{starting_folder}/file.{{n:02}}.ext"
    output: f"{starting_folder}/file.{{n}}_new.ext"
    shell: "ln -s {input} {output}"

CodePudding user response:

The way I would approach this problem is by using an input function, and formatting the expected output beforehand

numbers = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]

# a dict mapping expected numbers to their old version, assuming they are unique 
expected_numbers = {n.lstrip("0") : n for n in numbers} 

def expected_file(wildcards):
    """Takes the expected output number and returns the input number"""
    old_number = expected_numbers[wildcards.number]
    return f"project/temp/file.{old_number}.ext"

rule rename_files:
    input: expected_file  # using the input function to get the expected file
    output: "project/temp/file.{number}_new.ext"
    shell: "ln -s {input} {output}"

rule target: 
    input: expand("project/temp/file.{number}_new.ext", number=expected_numbers)

This is a bit more verbose than SultanOrazbayev's answer, but maybe a tad more explicit ? It also allows to avoid escaping brackets in inputs and outputs or rules, which can be tricky to debug in bigger projects.

It is also using two snakemake features that can be useful to others :

  • Related