Home > Blockchain >  math on second column of two files
math on second column of two files

Time:05-13

I have two files,

>cat foo.txt
QGP 1044
TGP 634
KGP 616
DGA 504
PGP 481
KGD 465
QGE 456
TGD 393
DGS 367
TGA 366
>cat bar.txt
QGP 748.6421
TGP 564.0048
KGP 568.7543
DGA 193.6391
PGP 405.1929
KGD 248.7047
QGE 287.7652
TGD 246.6278
DGS 143.6255
TGA 210.1166

Column 1 is identical in both files. I need to do a mathematical operation like so,

(foo.txt$column2 - bar.txt$column2)/sqrt(bar.txt$column2)

and output column1 and the math-operated column2. I can't figure out how to iterate over each row using awk. Really appreciate any help!

CodePudding user response:

The idiomatic technique is: iterate over the first file, and create a mapping from $1 to $2. Then, iterate over the 2nd file, and use the mapping for the current $1

awk '
    NR == FNR { # this condition is true for the lines of the first file [1]
        foo[$1] = $2
        next
    }
    {
        print $1, (foo[$1] - $2) / sqrt($2)
    }
' foo.txt bar.txt

outputs

QGP 10.7947
TGP 2.94732
KGP 1.98107
DGA 22.3034
PGP 3.76599
KGD 13.7153
QGE 9.91737
TGD 9.32047
DGS 18.6388
TGA 10.754

[1]: NR == FNR

FNR is the record number of the current file. NR is the total record number of all files seen so far. Those values will only be the same for the first file. This breaks down when the first file is empty. In that case, NR == FNR is true for the first file that has at least one line. A more reliable condition is:

awk '
    FILENAME == ARGV[1] {
        do stuff for the first file
        next
    }
    {
        this action is for each subsequent file
    }
' file1 file2 ...

CodePudding user response:

You could use join:

$ join foo.txt bar.txt | awk '{print ($2 - $3)/sqrt($3)}'

or (assuming the files are properly sorted) read alternate lines with awk:

$ awk '{getline b < "bar.txt"; split(b, a); print ($2 - a[2])/sqrt(a[2])}' foo.txt

CodePudding user response:

A perl solution:

paste foo.txt bar.txt | \
  perl -F'\t' -lane 'print join "\t", $F[0], ( ($F[1] - $F[3]) / ($F[3])**0.5 );' > out.txt'

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F'/\t/' : Split into @F on TAB, rather than on whitespace. The array @F is zero-indexed.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

CodePudding user response:

Yet another way of writing it:

$ awk '{
    if($1 in a)                       # if index has been met before ie. 2nd file
        print $1,(a[$1]-$2)/sqrt($2)  # compute and output
    else                              # else 1st file 
        a[$1]=$2                      # hash the value
}' foo bar

Some output:

QGP 10.7947
TGP 2.94732
KGP 1.98107
...
  • Related