Home > database >  Take mean of columns in text file by every 10-row blocks in bash
Take mean of columns in text file by every 10-row blocks in bash

Time:02-15

I have a tab delimited text file with two columns without header. Now, I want to take the mean of each column within blocks of 10 rows. This means, I take the first 10 rows, take the mean between the 10 numbers in each column and output the mean of each column into another text file. Now go further, take the next 10 rows and make the same again. Until the end of the file. If there are less than 10 rows left at the end, just take the mean of the left rows.

Input file:

0.32832977  3.50941E-10
0.31647876  3.38274E-10
0.31482627  3.36508E-10
0.31447645  3.36134E-10
0.31447645  3.36134E-10
0.31396809  3.35591E-10
0.31281157  3.34354E-10
0.312004    3.33491E-10
0.31102326  3.32443E-10
0.30771822  3.2891E-10
0.30560062  3.26647E-10
0.30413213  3.25077E-10
0.30373717  3.24655E-10
0.29636685  3.16777E-10
0.29622422  3.16625E-10
0.29590765  3.16286E-10
0.2949896   3.15305E-10
0.29414582  3.14403E-10
0.28841901  3.08282E-10
0.28820667  3.08055E-10
0.28291832  3.02403E-10
0.28243792  3.01889E-10
0.28156429  3.00955E-10
0.28043638  2.9975E-10
0.27872239  2.97918E-10
0.27833349  2.97502E-10
0.27825573  2.97419E-10
0.27669023  2.95746E-10
0.27645657  2.95496E-10

Expected output text file:

0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.987864E-10

I tried this code, but i don't know how to include the loop for each 10th row:

awk '{x =$1;next}END{print x/NR}' file

CodePudding user response:

Here is an awk to do this:

awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1 =$1;sum2 =$2}
FNR%m==0{print sum1/m,sum2/m; lfnr=FNR; next}
END{print sum1/(FNR-lfnr),sum2/(FNR-lfnr)}' file

Prints:

0.314611    3.36278e-10
0.296773    3.17211e-10
0.279535    2.98786e-10

Or if you want the same number of decimals you have, you can use printf:

awk -v m=10 -v OFS="\t" '
FNR%m==1{sum1=0;sum2=0}
{sum1 =$1;sum2 =$2}
FNR%m==0{printf("%0.9G%s%0.9G\n",sum1/m,OFS,sum2/m); lfnr=FNR; next}
END{printf("%0.9G%s%0.9G\n",sum1/(FNR-lfnr),OFS,sum2/(FNR-lfnr))}' file

Prints:

0.314611284 3.36278E-10
0.296772974 3.172112E-10
0.279535036 2.98786444E-10

Your superpower here is the % modulo operator which allows you to detect ever m step -- in this case every 10th. Your x-ray vision is the FNR awk special variable which is the line of the file you are reading.

FNR is always less than 10 and when 0 you are on the 10th iteration and time to print. When 1 you are on the first iteration and it is time to reset the sums.

  •  Tags:  
  • bash
  • Related