I'm trying to use awk
and GNU parallel
to filter the files based on the values in column 1 and column 2 and dump the result in a single .csv.gz file. Thanks to the answer here, I could manage to write myscript.sh
to do the job in parallel.
#!/bin/bash
doit() {
pigz -dc $1 | awk -F, '$1>0.5 && $2<1.5'
}
export -f doit
find $1 -name '*.csv.gz' | parallel doit | pigz > output.csv.gz
and then run the script in the terminal.
./myscript.sh /path/to/files
I'm wondering how I can pass 0.5 and 1.5 as arguments of myscript.sh
?
./myscript.sh /path/to/files 0.5 1.5
CodePudding user response:
This is may be an easier, or more explicit, way of passing variables and parameters around:
#!/bin/bash
dir="$1"
# Pick up second and third parameters, defaulting to 0.5 and 1.5 if unspecified
a=${2:-0.5}
b=${3:-1.5}
doit() {
file=$1
a=$2
b=$3
echo "File: $file, a=$a, b=$b"
cat "$1" | awk -F, -v a="$a" -v b="$b" '$1>a && $2<b'
}
export -f doit
find "$dir" -name '*.tst' | parallel doit {} "$a" "$b"
CodePudding user response:
#!/bin/bash
doit() {
pigz -dc $1 | awk -F, '$1>'$2' && $2<'$3
}
export -f doit
find $1 -name '*.csv.gz' | parallel doit {} $2 $3 | pigz > output.csv.gz
Call as:
paste <(seq 10 | shuf) <(seq 10 | shuf) | gzip > h.csv.gz
./myscript.sh . 5 6
zcat output.csv.gz