Home > Back-end >  Snakemake: define output from list of filenames in text file
Snakemake: define output from list of filenames in text file

Time:10-10

Completely new to snakemake, so bear with me. I've spent a considerable amount of time to search for a similar question without any luck.

I want to create a rule to copy certain files to a new directory from a folder containing all the files.

The filenames for the files I want to copy are listed in a text file (one filename per line). I've written a small bash script using cat and xargs to copy the filenames listed in the text file to the target directory. This script works fine!

How do I tell snakemake that my output should be the target directory filenames listed in the text file?

Ok, so my initial thought was to create a list inside the snakefile containing all target paths for the files that should be copied.

I made this hot garbage of a mess (also completely new to python):

import glob
all_files = glob.glob("path/to/all/files", recursive = False)

file = open("path/to/file/list_of_files.txt", "r")

file_lines = file.readlines()

path_to_target_dir = "some/path/"

path_lines = [path_to_target_dir   str(x) for x in file_lines]
# for some reason path_lines end with line break after each filename. not good.

# remove line break to yield correct paths   filenames
list_of_correct_paths = []

for element in path_lines:
    list_of_correct_paths.append(element.strip())

This yields a list with all paths to where I want to copy the files.

rule cp_files_to_target_dir:
  input:
    cp_from = expand("path/to/all/files/{id}", id = all_files),
    list = "list_of_files.txt",
    script = "bash_script.sh"
  output:
    cp_to = expand("{path}", path = list_of_correct_paths)
  shell:
    "{input.script}"

However, snakemake states that I'm missing input files for the rule.

I hope my question makes sense. I appreciate any help I can get.

EDIT: this works now

import os
# filenames for all files 
files = os.listdir("/path/to/all/files")

# create paths to files of interest from text file
text_file = open("path/to/text_file.txt", "r")
list_files = text_file.read().splitlines()

target_path = "path/to/target/dir"

target_file_paths = [target_path   str(x) for x in list_files]

This yields a list with all paths to where I want to copy the files.

rule cp_files_to_target_dir:
  input:
    cp_from = expand("path/to/all/files/{id}", id = list_files),
    list = "text_file.txt",
    script = "bash_script.sh"
  output:
    cp_to = expand("{path}", path = target_file_paths)
  shell:
    "{input.script}"

CodePudding user response:

However, snakemake states that I'm missing input files for the rule.

Probably variable cp_from in rule cp_files_to_target_dir does not contain the correct paths. To debug, I would suggest moving it outside the rule and print it to see what it contains. E.g.

cp_from = expand("path/to/all/files/{id}", id = all_files),
cp_to = expand("{path}", path = list_of_correct_paths)

# To debug:
print(cp_from) # Check these are what you expect
print(cp_to)

rule cp_files_to_target_dir:
  input:
    cp_from = cp_from,
    list = "list_of_files.txt",
    script = "bash_script.sh"
  output:
    cp_to = cp_to,
  shell:
    "{input.script}"

In general, I think your script could be tidied up a bit but I cannot be more specific without more context.

  • Related