I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/
.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
CodePudding user response:
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
- See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of
${url##*/}
etc. - The subshell (
( ... )
) is used to ensure that thecd
doesn't affect the main program.
CodePudding user response:
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything before the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything after the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt