I have two files,
>cat foo.txt
QGP 1044
TGP 634
KGP 616
DGA 504
PGP 481
KGD 465
QGE 456
TGD 393
DGS 367
TGA 366
>cat bar.txt
QGP 748.6421
TGP 564.0048
KGP 568.7543
DGA 193.6391
PGP 405.1929
KGD 248.7047
QGE 287.7652
TGD 246.6278
DGS 143.6255
TGA 210.1166
Column 1 is identical in both files. I need to do a mathematical operation like so,
(foo.txt$column2 - bar.txt$column2)/sqrt(bar.txt$column2)
and output column1 and the math-operated column2. I can't figure out how to iterate over each row using awk. Really appreciate any help!
CodePudding user response:
The idiomatic technique is: iterate over the first file, and create a mapping from $1 to $2. Then, iterate over the 2nd file, and use the mapping for the current $1
awk '
NR == FNR { # this condition is true for the lines of the first file [1]
foo[$1] = $2
next
}
{
print $1, (foo[$1] - $2) / sqrt($2)
}
' foo.txt bar.txt
outputs
QGP 10.7947
TGP 2.94732
KGP 1.98107
DGA 22.3034
PGP 3.76599
KGD 13.7153
QGE 9.91737
TGD 9.32047
DGS 18.6388
TGA 10.754
[1]: NR == FNR
FNR
is the record number of the current file. NR
is the total record number of all files seen so far. Those values will only be the same for the first file.
This breaks down when the first file is empty. In that case, NR == FNR
is true for the first file that has at least one line.
A more reliable condition is:
awk '
FILENAME == ARGV[1] {
do stuff for the first file
next
}
{
this action is for each subsequent file
}
' file1 file2 ...
CodePudding user response:
You could use join
:
$ join foo.txt bar.txt | awk '{print ($2 - $3)/sqrt($3)}'
or (assuming the files are properly sorted) read alternate lines with awk:
$ awk '{getline b < "bar.txt"; split(b, a); print ($2 - a[2])/sqrt(a[2])}' foo.txt
CodePudding user response:
A perl solution:
paste foo.txt bar.txt | \
perl -F'\t' -lane 'print join "\t", $F[0], ( ($F[1] - $F[3]) / ($F[3])**0.5 );' > out.txt'
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
-a
: Split $_
into array @F
on whitespace or on the regex specified in -F
option.
-F'/\t/'
: Split into @F
on TAB, rather than on whitespace. The array @F
is zero-indexed.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
CodePudding user response:
Yet another way of writing it:
$ awk '{
if($1 in a) # if index has been met before ie. 2nd file
print $1,(a[$1]-$2)/sqrt($2) # compute and output
else # else 1st file
a[$1]=$2 # hash the value
}' foo bar
Some output:
QGP 10.7947
TGP 2.94732
KGP 1.98107
...