Home > Blockchain >  How to optimize awk search and replace script
How to optimize awk search and replace script

Time:03-10

Here is the problem.

I have a file, where I need to go through and replace instances of numbers with their corresponding name/ID, in the first and second columns.

The file in question looks like this, which we shall call samples.txt:

a   b   nSites  J9  J8  J7  J6  J5  J4  J3  J2  J1  rab Fa  Fb  theta   inbred_relatedness_1_2  inbred_relatedness_2_1  fraternity  identity    zygosity    2of3_IDB    FDiff   loglh   nIter   bestoptimll coverage    2dsfs   R0  R1  KING    2dsfs_loglike   2dsfsf_niter
0   1   110869  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000004    0.000000    0.000001    0.000000    0.000000    0.000001    0.000000    0.000001    0.000003    0.000001    -172458.670509  58  -172458.743214  0.999964    1.837844e-01,1.472857e-01,3.964455e-02,1.549278e-01,1.560343e-01,9.930636e-02,3.822848e-02,9.223201e-02,8.855644e-02    0.499076    0.272966    0.000358    -231585.329751  5
0   2   110862  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    -183086.681714  38  -183086.685699  0.999901    1.825920e-01,1.500752e-01,3.805942e-02,1.515067e-01,1.627393e-01,9.600464e-02,4.057636e-02,9.130610e-02,8.714018e-02    0.483201    0.286751    0.006714    -231450.057989  6
0   3   110862  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000003    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000001    -183177.485139  45  -183177.507865  0.999901    1.829350e-01,1.450782e-01,4.270378e-02,1.540806e-01,1.580381e-01,9.817012e-02,4.022413e-02,9.105947e-02,8.771068e-02    0.524734    0.276621    -0.009718   -232160.853650  6
0   4   110865  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000006    0.000001    0.000001    0.000000    0.000001    0.000000    0.000000    0.000000    0.000004    0.000003    -185038.457036  53  -185038.537727  0.999928    1.763022e-01,1.560086e-01,3.841510e-02,1.517147e-01,1.659002e-01,9.265872e-02,3.999819e-02,9.325335e-02,8.574894e-02    0.472653    0.290011    0.010993    -231288.726053  5
0   5   110865  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000002    0.000005    0.000002    0.000002    0.000002    0.000002    0.000000    0.000002    0.000002    0.000003    0.000001    -186050.631757  47  -186050.757441  0.999928    1.724535e-01,1.597864e-01,3.847920e-02,1.523892e-01,1.663541e-01,9.152108e-02,4.180401e-02,9.761899e-02,7.959339e-02    0.482604    0.286029    0.006939    -231368.083428  6
1   2   110866  0.175538    0.824462    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.412231    0.000000    0.000000    0.206115    0.000000    0.000000    0.000000    0.000000    0.000000    0.412231    0.000000    -159435.171396  126 -1  0.999937    2.600724e-01,1.168766e-01,1.398141e-05,1.145118e-01,1.969803e-01,8.406278e-02,3.143034e-07,9.043812e-02,1.370437e-01    0.000073    0.485288    0.246236    -207541.041229  13
1   3   110866  0.079102    0.920898    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.460449    0.000000    0.000000    0.230225    0.000000    0.000000    0.000000    0.000000    0.000000    0.460449    0.000000    -158580.390547  134 -1  0.999937    2.647743e-01,1.121666e-01,1.169660e-05,1.124294e-01,1.954374e-01,8.770696e-02,1.354863e-05,8.664493e-02,1.408153e-01    0.000129    0.489851    0.247381    -206817.111836  13
1   4   110869  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000000    0.000002    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000002    -0.000001   -165161.382582  46  -165161.438820  0.999964    1.746199e-01,1.620008e-01,4.033822e-02,1.490423e-01,1.539301e-01,9.257268e-02,4.436021e-02,9.920260e-02,8.393319e-02    0.550240    0.262001    -0.019079   -232444.799468  5
1   5   110869  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    -0.000000   -166172.116239  54  -166172.136306  0.999964    1.753381e-01,1.619384e-01,3.967325e-02,1.464603e-01,1.606635e-01,8.843158e-02,4.485591e-02,1.011443e-01,8.149469e-02    0.526125    0.275815    -0.010246   -232079.896156  5
2   3   110860  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000001    0.000001    0.000001    0.000001    0.000001    0.000000    0.000001    0.000001    0.000001    -0.000000   -173932.994973  42  -173933.036121  0.999883    1.866618e-01,1.443290e-01,4.370005e-02,1.507040e-01,1.596694e-01,9.379667e-02,3.991130e-02,9.013509e-02,9.109267e-02    0.523653    0.283818    -0.009462   -232151.462292  6
2   4   110864  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    -175792.304256  98  -175792.304466  0.999919    1.732800e-01,1.590686e-01,4.234024e-02,1.535167e-01,1.620891e-01,8.853151e-02,4.122604e-02,9.397397e-02,8.597382e-02    0.515558    0.280113    -0.006156   -231990.297434  6
2   5   110862  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    -176798.506531  90  -176798.512845  0.999901    1.710777e-01,1.636463e-01,3.995366e-02,1.531499e-01,1.614073e-01,8.958210e-02,4.242638e-02,9.869295e-02,8.006375e-02    0.510386    0.274759    -0.004050   -231742.876506  6
3   4   110863  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000003    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000002    0.000001    -175883.748337  54  -175883.792877  0.999910    1.738435e-01,1.626377e-01,4.081213e-02,1.500631e-01,1.501179e-01,9.394541e-02,4.411551e-02,1.024025e-01,8.206222e-02    0.565739    0.252734    -0.024389   -232587.265425  6
3   5   110862  1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000001    0.000000    0.000001    0.000001    -0.000000   -176893.423638  66  -176893.448036  0.999901    1.741417e-01,1.618187e-01,4.133086e-02,1.465582e-01,1.591761e-01,8.839478e-02,4.595996e-02,1.027492e-01,7.987048e-02    0.548392    0.271256    -0.018836   -232451.945313  6
4   5   110867  0.175600    0.824400    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.000000    0.412200    0.000000    0.000000    0.206100    0.000000    0.000000    0.000000    0.000000    0.000000    0.412200    -0.000000   -173922.125305  82  -1  0.999946    2.436631e-01,1.243375e-01,5.833181e-08,1.229663e-01,2.042798e-01,8.790167e-02,6.085541e-07,9.511358e-02,1.217374e-01    0.000003    0.474716    0.243514    -208520.938655  9

Here is my current solution, which is a bash script file that I call on the file I want to modify. \

Example: bash change-column-value.sh samples.txt

FILE=$1
function main {
  awk -v search=$1 -v replace=$2 '$1 == search { $1 = replace }1' $FILE > tmp
  awk -v search=$1 -v replace=$2 '$2 == search { $2 = replace }1' tmp > $FILE
  rm tmp
}

main 0 A0081.bam
main 1 A0082.bam
main 3 A0083.bam
main 4 A0084.bam
main 5 A0085.bam
main 6 A0086.bam

Final result:

a   b   nSites  J9  J8  J7  J6  J5  J4  J3  J2  J1  rab Fa  Fb  theta   inbred_relatedness_1_2  inbred_relatedness_2_1  fraternity  identity    zygosity    2of3_IDB    FDiff   loglh   nIter   bestoptimll coverage    2dsfs   R0  R1  KING    2dsfs_loglike   2dsfsf_niter
A0081.bam A0082.bam 110869 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000004 0.000000 0.000001 0.000000 0.000000 0.000001 0.000000 0.000001 0.000003 0.000001 -172458.670509 58 -172458.743214 0.999964 1.837844e-01,1.472857e-01,3.964455e-02,1.549278e-01,1.560343e-01,9.930636e-02,3.822848e-02,9.223201e-02,8.855644e-02 0.499076 0.272966 0.000358 -231585.329751 5
A0081.bam A0083.bam 110862 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -183086.681714 38 -183086.685699 0.999901 1.825920e-01,1.500752e-01,3.805942e-02,1.515067e-01,1.627393e-01,9.600464e-02,4.057636e-02,9.130610e-02,8.714018e-02 0.483201 0.286751 0.006714 -231450.057989 6
A0081.bam A0084.bam 110862 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000003 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000001 -183177.485139 45 -183177.507865 0.999901 1.829350e-01,1.450782e-01,4.270378e-02,1.540806e-01,1.580381e-01,9.817012e-02,4.022413e-02,9.105947e-02,8.771068e-02 0.524734 0.276621 -0.009718 -232160.853650 6
A0081.bam A0085.bam 110865 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000006 0.000001 0.000001 0.000000 0.000001 0.000000 0.000000 0.000000 0.000004 0.000003 -185038.457036 53 -185038.537727 0.999928 1.763022e-01,1.560086e-01,3.841510e-02,1.517147e-01,1.659002e-01,9.265872e-02,3.999819e-02,9.325335e-02,8.574894e-02 0.472653 0.290011 0.010993 -231288.726053 5
A0081.bam A0086.bam 110865 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000002 0.000005 0.000002 0.000002 0.000002 0.000002 0.000000 0.000002 0.000002 0.000003 0.000001 -186050.631757 47 -186050.757441 0.999928 1.724535e-01,1.597864e-01,3.847920e-02,1.523892e-01,1.663541e-01,9.152108e-02,4.180401e-02,9.761899e-02,7.959339e-02 0.482604 0.286029 0.006939 -231368.083428 6
A0082.bam A0083.bam 110866 0.175538 0.824462 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.412231 0.000000 0.000000 0.206115 0.000000 0.000000 0.000000 0.000000 0.000000 0.412231 0.000000 -159435.171396 126 -1 0.999937 2.600724e-01,1.168766e-01,1.398141e-05,1.145118e-01,1.969803e-01,8.406278e-02,3.143034e-07,9.043812e-02,1.370437e-01 0.000073 0.485288 0.246236 -207541.041229 13
A0082.bam A0084.bam 110866 0.079102 0.920898 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.460449 0.000000 0.000000 0.230225 0.000000 0.000000 0.000000 0.000000 0.000000 0.460449 0.000000 -158580.390547 134 -1 0.999937 2.647743e-01,1.121666e-01,1.169660e-05,1.124294e-01,1.954374e-01,8.770696e-02,1.354863e-05,8.664493e-02,1.408153e-01 0.000129 0.489851 0.247381 -206817.111836 13
A0082.bam A0085.bam 110869 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000000 0.000002 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000002 -0.000001 -165161.382582 46 -165161.438820 0.999964 1.746199e-01,1.620008e-01,4.033822e-02,1.490423e-01,1.539301e-01,9.257268e-02,4.436021e-02,9.920260e-02,8.393319e-02 0.550240 0.262001 -0.019079 -232444.799468 5
A0082.bam A0086.bam 110869 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 -0.000000 -166172.116239 54 -166172.136306 0.999964 1.753381e-01,1.619384e-01,3.967325e-02,1.464603e-01,1.606635e-01,8.843158e-02,4.485591e-02,1.011443e-01,8.149469e-02 0.526125 0.275815 -0.010246 -232079.896156 5
A0083.bam A0084.bam 110860 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000001 0.000001 0.000001 0.000001 0.000001 0.000000 0.000001 0.000001 0.000001 -0.000000 -173932.994973 42 -173933.036121 0.999883 1.866618e-01,1.443290e-01,4.370005e-02,1.507040e-01,1.596694e-01,9.379667e-02,3.991130e-02,9.013509e-02,9.109267e-02 0.523653 0.283818 -0.009462 -232151.462292 6
A0083.bam A0085.bam 110864 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -175792.304256 98 -175792.304466 0.999919 1.732800e-01,1.590686e-01,4.234024e-02,1.535167e-01,1.620891e-01,8.853151e-02,4.122604e-02,9.397397e-02,8.597382e-02 0.515558 0.280113 -0.006156 -231990.297434 6
A0083.bam A0086.bam 110862 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 -176798.506531 90 -176798.512845 0.999901 1.710777e-01,1.636463e-01,3.995366e-02,1.531499e-01,1.614073e-01,8.958210e-02,4.242638e-02,9.869295e-02,8.006375e-02 0.510386 0.274759 -0.004050 -231742.876506 6
A0084.bam A0085.bam 110863 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000003 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000002 0.000001 -175883.748337 54 -175883.792877 0.999910 1.738435e-01,1.626377e-01,4.081213e-02,1.500631e-01,1.501179e-01,9.394541e-02,4.411551e-02,1.024025e-01,8.206222e-02 0.565739 0.252734 -0.024389 -232587.265425 6
A0084.bam A0086.bam 110862 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000001 0.000000 0.000001 0.000001 -0.000000 -176893.423638 66 -176893.448036 0.999901 1.741417e-01,1.618187e-01,4.133086e-02,1.465582e-01,1.591761e-01,8.839478e-02,4.595996e-02,1.027492e-01,7.987048e-02 0.548392 0.271256 -0.018836 -232451.945313 6
A0085.bam A0086.bam 110867 0.175600 0.824400 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.412200 0.000000 0.000000 0.206100 0.000000 0.000000 0.000000 0.000000 0.000000 0.412200 -0.000000 -173922.125305 82 -1 0.999946 2.436631e-01,1.243375e-01,5.833181e-08,1.229663e-01,2.042798e-01,8.790167e-02,6.085541e-07,9.511358e-02,1.217374e-01 0.000003 0.474716 0.243514 -208520.938655 9

As is, this solution works, but it feels inefficient. Is there a way in which I can only call Awk once, but modify the file as I wish?

CodePudding user response:

For doing all the replacements in a single awk call you could do:

awk -v from='0 1 2 3 4 5' -v to='A0081.bam A0082.bam A0083.bam A0084.bam A0085.bam A0086.bam' '
    BEGIN {
        fromCount = split(from, fromArr)
        toCount = split(to, toArr)
        for (i = 1; i <= fromCount; i  )
            tr[fromArr[i]] = toArr[i]
    }
    $1 in tr { $1 = tr[$1] }
    $2 in tr { $2 = tr[$2] }
    1
' samples.txt
  • Related