Home > Blockchain >  How to get average of every nth lines
How to get average of every nth lines

Time:07-28

I have a data of thousand lines like this:

1.01
2.01
3.01
4.01
5.012
6.019
7.01
8.013
9.01
10.01
11.01
12.01
13.01
14.01
15.5

I would like to get averages of every 9 lines (or general nth line if 9 is two short). This command does not work average like what I expected:

awk '{sum[$1]=sum[$1]   $1; nr[$1]  } END {for (a in sum) {print a, sum[a]/nr[a]}}' test.txt

Expected results:

1.01    
2.01    
3.01    
4.01    
5.012   5.011555556 #(1.01 2.01 3.01 4.01 5.012 6.019 7.01 8.013 9.01)/9
6.019   6.011555556 #(     2.01 3.01 4.01 5.012 6.019 7.01 8.013 9.01 10.01)/9
7.01    7.011555556 #(          3.01 4.01 5.012 6.019 7.01 8.013 9.01 10.01 11.01)/9
8.013   8.011555556
9.01    9.011555556
10.01   10.01133333
11.01   11.06477778
12.01
13.01
14.01
15.5

CodePudding user response:

As per OP if average needs to be printed from n/2th OR n-1/2th line onwards then following will help.

awk '
BEGIN{OFS="\t"}
FNR==NR{
  val[FNR]=$0
  count=NR
  next
}
!till{
  till=(count%2==0?count/2:(count-1)/2)
}
till && FNR>4{ flag=1 }
flag &&   count1<=till 0{
  sum=""
  for(i=count1;i<=count1 8;i  ){
    sum =val[i]
  }
  $0=$0 OFS (sum/9)
}
1
'  Input_file  Input_file

With shown samples output will be as follows:

1.01
2.01
3.01
4.01
5.012   5.01156
6.019   6.01156
7.01    7.01156
8.013   8.01156
9.01    9.01156
10.01   10.0113
11.01   11.0648
12.01
13.01
14.01
15.5

Explanation: Adding detailed explanation for above code.

awk '                                    ##Starting awk program from here.
BEGIN{OFS="\t"}                          ##Setting OFS as tab here.
FNR==NR{                                 ##Checking condition when FNR==NR.
  val[FNR]=$0                            ##Creating val array with index of FMR and value is $0.
  count=NR                               ##Setting count to NR value.
  next                                   ##next will skip all further lines from here.
}
!till{                                   ##If till is NOT set then try following.
  till=(count%2==0?count/2:(count-1)/2)  ##Setting till based on total number of lines Logic by OP.
}
till && FNR>4{ flag=1 }                  ##Checking if till is SET and line number is greater than 4 then set flag.
flag &&   count1<=till 0{                ##When flag is SET and count1 is lesser than OR equals to till then do following:
  sum=""                                 ##Nullifying sum here.
  for(i=count1;i<=count1 8;i  ){         ##Running a for loop from here.
    sum =val[i]                          ##Keep on adding value of val with index of i into sum variable.
  }
  $0=$0 OFS (sum/9)                      ##Assigning avg value to current line.
}
1                                        ##Printing edited/non-edited line here.
'  Input_file  Input_file                ##Mentioning Input_file names here.

CodePudding user response:

You can do it quite simply without arrays. Simply carry a sum variable and then on the nth line, compute the average and output resetting the sum and line to 0. For example, you could do:

awk -v nth=3 '
  {
    sum =$1
      line
    printf "line: -  sum: %6.3f", line, sum
  }
  line == nth {
    print "   avg: " sum/nth
    line = sum = 0
    next
  }
  { print "" }
' values.txt

(above the value of the nth line to average is provided as a variable with -v nth=3)

Example Use/Output

With your sample data in values.txt you can run the script by pasting at the command line in the directory containing values.txt and would receive:

$ awk -v nth=3 '
>   {
>     sum =$1
>       line
>     printf "line: -  sum: %6.3f", line, sum
>   }
>   line == nth {
>     print "   avg: " sum/nth
>     line = sum = 0
>     next
>   }
>   { print "" }
> ' values.txt
line:  1  sum:  1.010
line:  2  sum:  3.020
line:  3  sum:  6.030   avg: 2.01
line:  1  sum:  4.010
line:  2  sum:  9.022
line:  3  sum: 15.041   avg: 5.01367
line:  1  sum:  7.010
line:  2  sum: 15.023
line:  3  sum: 24.033   avg: 8.011
line:  1  sum: 10.010
line:  2  sum: 21.020
line:  3  sum: 33.030   avg: 11.01
line:  1  sum: 13.010
line:  2  sum: 27.020
line:  3  sum: 42.520   avg: 14.1733

Above with nth = 3 the average is computed for every 3-lines. The line number within the running sum along with the current sum is printed for every line.

If you wanted to add the actual line number to your output, you could simply use FNR to provide it, e.g.

awk -v nth=3 '
  {
    sum =$1
      line
    printf "no. -  line: -  sum: %6.3f", FNR, line, sum
  }
  line == nth {
    print "   avg: " sum/nth
    line = sum = 0
    next
  }
  { print "" }
' values.txt

awk is an amazingly flexible tool that can do just about any type text processing you need.

Sample Output

The first few lines of output would now be:

no.  1  line:  1  sum:  1.010
no.  2  line:  2  sum:  3.020
no.  3  line:  3  sum:  6.030   avg: 2.01
no.  4  line:  1  sum:  4.010
no.  5  line:  2  sum:  9.022
no.  6  line:  3  sum: 15.041   avg: 5.01367
...
  • Related