Home > Software engineering >  How can I loop through consecutive file directories?
How can I loop through consecutive file directories?

Time:11-25

I'm in directory a. Under directory a there is a fixed directory b. There are d{091....099} directories under the b directory. There are also different *.gz files under this d directory. I need to extract data from these files (filenames) and print them one after the other.

i have tried this:

#!/bin/bash

for file in $(find . -name "*.gz"); do
  file=$(basename ${file})
       echo ${file:0:4} | tee -a receiver_ids > log
       echo ${file:16:17} | tee -a doy > log2
       echo ${file:0:100} | tee -a data_record > log3
done
cut -c 1-3 < doy > doy2
cut -c 1-23 < data_record > summary_name

but by doing this, the files are processed in an unordered way. this is what i get from,

cat data_record
ISTA00TUR_R_20190940000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190990000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190970000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190920000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190980000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190910000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190960000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190930000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190950000_01D_30S_MO.crx.gz

how can i fix this?

CodePudding user response:

"the files are processed in an unordered way" -- maybe it is because the command find ... does not give the results in the order you want.

How about this

for file in b/d*/*gz; do
  ...
done

or

for file in $(find . -name "*.gz" | sort); do
  ...
done

By the way, if I were you, I would not use the name file, because there is a program called file, easy to get confused.

CodePudding user response:

As you're running Ubuntu, I assume that you have access to the GNU version of the standard tools.

It seems like you want to sort your filepaths by filename. It's a little complicated but here's a way to do the task robustly and somewhat efficiently:

#!/bin/bash

find . -name '*.gz' -printf '%f\0' |
sort -z |
while IFS='' read -r -d '' fname
do
    printf '%s\n' "${fname:0:4}" >&3
    printf '%s\n' "${fname:16:17}" >&4
    printf '%s\n' "${fname:0:100}" >&5
done \
   3> >(tee -a receiver_ids > log) \
   4> >(tee -a doy > log2) \
   5> >(tee -a data_record > log3)

explanations:

  • The -printf '%f\0' predicate of GNU find will output a NUL-delimited stream of filenames (i.e. without any leading directory path)

  • GNU sort -z sorts those NUL-delimited filenames

  • bash's while IFS='' read -r -d '' fname loads each NUL-delimited filename into the variable fname.

  • Then I define a few file descriptors (3,4 and 5) for the while ...; do ...; done loop, each one being bound to a process substitution; the purpose is to get rid of the forks inside the loop (which are horribly slow).

  • Related