Create Directory, download file and execute command from list of URL-CodePudding

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.

My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.

Example URLs:

http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz

Ideally, the file name determines the directory name, for example, TC_3370/.

I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast

Currently I just run the commands in line:

mkdir TC_3370

cd TC_3370/

wget url 
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz

crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC

CodePudding user response：

Try this Shellcheck-clean program:

#! /bin/bash -p

while read -r url; do
    file=${url##*/}
    dir=${file%%_chr.*}
    mkdir -v -- "$dir"
    (
        cd "./$dir" || exit 1
        wget -- "$url"
        crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
    )
done <URL.txt

See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.

CodePudding user response：

Another implementation

#!/bin/sh

# Read lines as url as long as it can
while read -r url
do
  # Get file name by stripping-out anything before the last / from the url
  file_name=${url##*/}

  # Get the destination dir name by stripping anything after the first __chr
  dest_dir=${file_name%%_chr*}

  # Compose the wget output path
  fasta_path="$dest_dir/$file_name"

  if
    # Successfully created the destination directory AND
    mkdir -p -- "$dest_dir" &&
    # Successfully downloaded the file
    wget --output-file="$fasta_path" --quiet -- "$url" 
  then
    # Process the fasta file into fna
    fna_path="$dest_dir/TCV2_annot_cds.fna"
    crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
  else
    # Cleanup remove destination directory if any of mkdir or wget failed
    rm -fr -- "$dest_dir"
  fi
  # reading from the URL.txt file for the whole while loop
done < URL.txt