Hi I need some help in creating a command to execute more than one files at one. For example my data looks like this
chr_1_LUNG_CANCER_HRC_29000001_30000000_snptest.out.gz chr_1_LUNG_CANCER_HRC_96000001_97000000_snptest.out.gz
chr_1_LUNG_CANCER_HRC_189000001_190000000_snptest.out.gz chr_1_LUNG_CANCER_HRC_30000001_31000000_snptest.log chr_1_LUNG_CANCER_HRC_97000001_98000000_snptest.log
chr_1_LUNG_CANCER_HRC_190000001_191000000_snptest.log chr_1_LUNG_CANCER_HRC_30000001_31000000_snptest.out.gz chr_1_LUNG_CANCER_HRC_97000001_98000000_snptest.out.gz
chr_1_LUNG_CANCER_HRC_190000001_191000000_snptest.out.gz chr_1_LUNG_CANCER_HRC_3000001_4000000_snptest.log chr_1_LUNG_CANCER_HRC_98000001_99000000_snptest.log
chr_1_LUNG_CANCER_HRC_19000001_20000000_snptest.log chr_1_LUNG_CANCER_HRC_3000001_4000000_snptest.out.gz
All i want to do it to see the significant p.values in column no 50 of all the snptest.out.gz files. I know how to do this in one file in two steps as follows:
zcat chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out.gz | tail -n 15 > chunk_chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out #i removed the first 14 lines because these are simple text files and i only wanted columns.
cat chunk_chr_1_LUNG_CANCER_HRC_99000001_100000000_snptest.out | awk -v x=0.00000005 '$50 < x' > significant_hits.txt
But the issue is there are hundreds of files in the folder and i am bit confused of how to execute these two commands at once in all the snptest.out.gz in the folder. I need a single output file with combined output from all input files. Any clue?
CodePudding user response:
you could loop over all files:
for i in *.out.gz; do zcat $i | tail -n 15 | awk -v x=0.00000005 '$50 < x'; done > significant_hits.txt
or to make it a bit nicer looking :
for i in *.out.gz; do
zcat $i | tail -n 15 | awk -v x=0.00000005 '$50 < x';
done > significant_hits.txt
this assumes that all input files names end in .out.gz
CodePudding user response:
You can use a Shell for
loop.
If you only need the resulting file significant_hits.txt
you can use
for i in chr_1_LUNG_CANCER_HRC_*_snptest.out.gz
do
zcat "$i" | tail -n 15
done | awk -v x=0.00000005 '$50 < x' > significant_hits.txt
If you also want the uncompressed *_snptest.out
files:
for i in chr_1_LUNG_CANCER_HRC_*_snptest.out.gz
do
zcat "$i" | tail -n 15 > "${i%.gz}"
done
awk -v x=0.00000005 '$50 < x' chr_1_LUNG_CANCER_HRC_*_snptest.out > significant_hits.txt
I placed the awk
command outside the loop to avoid running a new awk
process for every input file.
Depending on what other files may exist in the current directory you can simplify the glob pattern to e.g. *_snptest.out.gz
or *.gz
.