Home > Software engineering >  How to execute one command on multiple files at once in linux
How to execute one command on multiple files at once in linux

Time:09-15

Hi I need some help in creating a command to execute more than one files at one. For example my data looks like this

chr_1_LUNG_CANCER_HRC_29000001_30000000_snptest.out.gz    chr_1_LUNG_CANCER_HRC_96000001_97000000_snptest.out.gz
chr_1_LUNG_CANCER_HRC_189000001_190000000_snptest.out.gz  chr_1_LUNG_CANCER_HRC_30000001_31000000_snptest.log       chr_1_LUNG_CANCER_HRC_97000001_98000000_snptest.log
chr_1_LUNG_CANCER_HRC_190000001_191000000_snptest.log     chr_1_LUNG_CANCER_HRC_30000001_31000000_snptest.out.gz    chr_1_LUNG_CANCER_HRC_97000001_98000000_snptest.out.gz
chr_1_LUNG_CANCER_HRC_190000001_191000000_snptest.out.gz  chr_1_LUNG_CANCER_HRC_3000001_4000000_snptest.log         chr_1_LUNG_CANCER_HRC_98000001_99000000_snptest.log
chr_1_LUNG_CANCER_HRC_19000001_20000000_snptest.log       chr_1_LUNG_CANCER_HRC_3000001_4000000_snptest.out.gz   

All i want to do it to see the significant p.values in column no 50 of all the snptest.out.gz files. I know how to do this in one file in two steps as follows:

zcat chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out.gz | tail -n 15 > chunk_chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out #i removed the first 14 lines because these are simple text files and i only wanted columns.
cat chunk_chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out | awk -v x=0.00000005 '$50 < x' > significant_hits.txt

But the issue is there are hundreds of files in the folder and i am bit confused of how to execute these two commands at once in all the snptest.out.gz in the folder. I need a single output file with combined output from all input files. Any clue?

CodePudding user response:

you could loop over all files:

for i in *.out.gz; do zcat $i | tail -n 15 | awk -v x=0.00000005 '$50 < x'; done > significant_hits.txt

or to make it a bit nicer looking :

for i in *.out.gz; do
   zcat $i | tail -n 15 | awk -v x=0.00000005 '$50 < x';
done > significant_hits.txt

this assumes that all input files names end in .out.gz

CodePudding user response:

You can use a Shell for loop.

If you only need the resulting file significant_hits.txt you can use

for i in chr_1_LUNG_CANCER_HRC_*_snptest.out.gz
do
    zcat "$i" | tail -n 15
done | awk -v x=0.00000005 '$50 < x' > significant_hits.txt

If you also want the uncompressed *_snptest.out files:

for i in chr_1_LUNG_CANCER_HRC_*_snptest.out.gz
do
    zcat "$i" | tail -n 15 > "${i%.gz}"
done

awk -v x=0.00000005 '$50 < x' chr_1_LUNG_CANCER_HRC_*_snptest.out > significant_hits.txt

I placed the awk command outside the loop to avoid running a new awk process for every input file.

Depending on what other files may exist in the current directory you can simplify the glob pattern to e.g. *_snptest.out.gz or *.gz.

  • Related