Home > Software engineering >  Loops in bash to organise fastqs
Loops in bash to organise fastqs

Time:11-02

I have a problem which I think needs loops to solve.

I have fastq files that follow the naming conventions, all in a generic directory e.g 'allpools' :

ls allpools 

2022_pool_1_Seq_GEX.fastq.gz 
2022_pool_1_Seq_CMO.fastq.gz
2022_pool_2_Seq_GEX.fastq.gz 
2022_pool_2_Seq_CMO.fastq.gz

I need one loop to make directories in 'allpools' for as many pool numbers as there are in the fastqs, where the resulting directories will be called :

pool1
pool2 

and then another loop whereby the original fastqs are moved and placed in respective directories for gex and cmo. As an example :

allpools/pool1/pool1_GEX #containing all gex fastqs of pool_1
allpools/pool1/pool1_CMO #containing all cmo fastqs of pool_1
####
allpools/pool2/pool2_GEX #containing all gex fastqs of pool_2
allpools/pool2/pool2_CMO #containing all cmo fastqs of pool_2

CodePudding user response:

Before:

$ tree allpools/
allpools/
├── 2022_pool_1_Seq_CMO.fastq.gz
├── 2022_pool_1_Seq_GEX.fastq.gz
├── 2022_pool_2_Seq_CMO.fastq.gz
└── 2022_pool_2_Seq_GEX.fastq.gz

Use sed to parse the filename and generate the directory name:

$ for f in allpools/*; do d=$(sed -E 's%. (pool_[0-9] ). (GEX|CMO). %\1/\1_\2%' <<<$f); mkdir -p allpools/$d; mv -vi $f allpools/$d/; done
renamed 'allpools/2022_pool_1_Seq_CMO.fastq.gz' -> 'allpools/pool_1/pool_1_CMO/2022_pool_1_Seq_CMO.fastq.gz'
renamed 'allpools/2022_pool_1_Seq_GEX.fastq.gz' -> 'allpools/pool_1/pool_1_GEX/2022_pool_1_Seq_GEX.fastq.gz'
renamed 'allpools/2022_pool_2_Seq_CMO.fastq.gz' -> 'allpools/pool_2/pool_2_CMO/2022_pool_2_Seq_CMO.fastq.gz'
renamed 'allpools/2022_pool_2_Seq_GEX.fastq.gz' -> 'allpools/pool_2/pool_2_GEX/2022_pool_2_Seq_GEX.fastq.gz'

After:

$ tree allpools/
allpools/
├── pool_1
│   ├── pool_1_CMO
│   │   └── 2022_pool_1_Seq_CMO.fastq.gz
│   └── pool_1_GEX
│       └── 2022_pool_1_Seq_GEX.fastq.gz
└── pool_2
    ├── pool_2_CMO
    │   └── 2022_pool_2_Seq_CMO.fastq.gz
    └── pool_2_GEX
        └── 2022_pool_2_Seq_GEX.fastq.gz
  • Related