I'm in directory a. Under directory a there is a fixed directory b. There are d{091....099} directories under the b directory. There are also different *.gz files under this d directory. I need to extract data from these files (filenames) and print them one after the other.
i have tried this:
#!/bin/bash
for file in $(find . -name "*.gz"); do
file=$(basename ${file})
echo ${file:0:4} | tee -a receiver_ids > log
echo ${file:16:17} | tee -a doy > log2
echo ${file:0:100} | tee -a data_record > log3
done
cut -c 1-3 < doy > doy2
cut -c 1-23 < data_record > summary_name
but by doing this, the files are processed in an unordered way. this is what i get from,
cat data_record
ISTA00TUR_R_20190940000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190990000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190970000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190920000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190980000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190910000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190960000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190930000_01D_30S_MO.crx.gz
ISTA00TUR_R_20190950000_01D_30S_MO.crx.gz
how can i fix this?
CodePudding user response:
"the files are processed in an unordered way" -- maybe it is because the command find ...
does not give the results in the order you want.
How about this
for file in b/d*/*gz; do
...
done
or
for file in $(find . -name "*.gz" | sort); do
...
done
By the way, if I were you, I would not use the name file
, because there is a program called file
, easy to get confused.
CodePudding user response:
As you're running Ubuntu, I assume that you have access to the GNU version of the standard tools.
It seems like you want to sort your filepaths by filename. It's a little complicated but here's a way to do the task robustly and somewhat efficiently:
#!/bin/bash
find . -name '*.gz' -printf '%f\0' |
sort -z |
while IFS='' read -r -d '' fname
do
printf '%s\n' "${fname:0:4}" >&3
printf '%s\n' "${fname:16:17}" >&4
printf '%s\n' "${fname:0:100}" >&5
done \
3> >(tee -a receiver_ids > log) \
4> >(tee -a doy > log2) \
5> >(tee -a data_record > log3)
explanations:
The
-printf '%f\0'
predicate of GNUfind
will output aNUL
-delimited stream of filenames (i.e. without any leading directory path)GNU
sort -z
sorts thoseNUL
-delimited filenamesbash's
while IFS='' read -r -d '' fname
loads eachNUL
-delimited filename into the variablefname
.Then I define a few file descriptors (
3
,4
and5
) for thewhile ...; do ...; done
loop, each one being bound to a process substitution; the purpose is to get rid of the forks inside the loop (which are horribly slow).