I have a data of thousand lines like this:
1.01
2.01
3.01
4.01
5.012
6.019
7.01
8.013
9.01
10.01
11.01
12.01
13.01
14.01
15.5
I would like to get averages of every 9 lines (or general nth line if 9 is two short). This command does not work average like what I expected:
awk '{sum[$1]=sum[$1] $1; nr[$1] } END {for (a in sum) {print a, sum[a]/nr[a]}}' test.txt
Expected results:
1.01
2.01
3.01
4.01
5.012 5.011555556 #(1.01 2.01 3.01 4.01 5.012 6.019 7.01 8.013 9.01)/9
6.019 6.011555556 #( 2.01 3.01 4.01 5.012 6.019 7.01 8.013 9.01 10.01)/9
7.01 7.011555556 #( 3.01 4.01 5.012 6.019 7.01 8.013 9.01 10.01 11.01)/9
8.013 8.011555556
9.01 9.011555556
10.01 10.01133333
11.01 11.06477778
12.01
13.01
14.01
15.5
CodePudding user response:
As per OP if average needs to be printed from n/2th OR n-1/2th line onwards then following will help.
awk '
BEGIN{OFS="\t"}
FNR==NR{
val[FNR]=$0
count=NR
next
}
!till{
till=(count%2==0?count/2:(count-1)/2)
}
till && FNR>4{ flag=1 }
flag && count1<=till 0{
sum=""
for(i=count1;i<=count1 8;i ){
sum =val[i]
}
$0=$0 OFS (sum/9)
}
1
' Input_file Input_file
With shown samples output will be as follows:
1.01
2.01
3.01
4.01
5.012 5.01156
6.019 6.01156
7.01 7.01156
8.013 8.01156
9.01 9.01156
10.01 10.0113
11.01 11.0648
12.01
13.01
14.01
15.5
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
BEGIN{OFS="\t"} ##Setting OFS as tab here.
FNR==NR{ ##Checking condition when FNR==NR.
val[FNR]=$0 ##Creating val array with index of FMR and value is $0.
count=NR ##Setting count to NR value.
next ##next will skip all further lines from here.
}
!till{ ##If till is NOT set then try following.
till=(count%2==0?count/2:(count-1)/2) ##Setting till based on total number of lines Logic by OP.
}
till && FNR>4{ flag=1 } ##Checking if till is SET and line number is greater than 4 then set flag.
flag && count1<=till 0{ ##When flag is SET and count1 is lesser than OR equals to till then do following:
sum="" ##Nullifying sum here.
for(i=count1;i<=count1 8;i ){ ##Running a for loop from here.
sum =val[i] ##Keep on adding value of val with index of i into sum variable.
}
$0=$0 OFS (sum/9) ##Assigning avg value to current line.
}
1 ##Printing edited/non-edited line here.
' Input_file Input_file ##Mentioning Input_file names here.
CodePudding user response:
You can do it quite simply without arrays. Simply carry a sum
variable and then on the nth
line, compute the average and output resetting the sum
and line
to 0
. For example, you could do:
awk -v nth=3 '
{
sum =$1
line
printf "line: - sum: %6.3f", line, sum
}
line == nth {
print " avg: " sum/nth
line = sum = 0
next
}
{ print "" }
' values.txt
(above the value of the nth
line to average is provided as a variable with -v nth=3
)
Example Use/Output
With your sample data in values.txt
you can run the script by pasting at the command line in the directory containing values.txt
and would receive:
$ awk -v nth=3 '
> {
> sum =$1
> line
> printf "line: - sum: %6.3f", line, sum
> }
> line == nth {
> print " avg: " sum/nth
> line = sum = 0
> next
> }
> { print "" }
> ' values.txt
line: 1 sum: 1.010
line: 2 sum: 3.020
line: 3 sum: 6.030 avg: 2.01
line: 1 sum: 4.010
line: 2 sum: 9.022
line: 3 sum: 15.041 avg: 5.01367
line: 1 sum: 7.010
line: 2 sum: 15.023
line: 3 sum: 24.033 avg: 8.011
line: 1 sum: 10.010
line: 2 sum: 21.020
line: 3 sum: 33.030 avg: 11.01
line: 1 sum: 13.010
line: 2 sum: 27.020
line: 3 sum: 42.520 avg: 14.1733
Above with nth = 3
the average is computed for every 3-lines. The line number within the running sum along with the current sum
is printed for every line.
If you wanted to add the actual line number to your output, you could simply use FNR
to provide it, e.g.
awk -v nth=3 '
{
sum =$1
line
printf "no. - line: - sum: %6.3f", FNR, line, sum
}
line == nth {
print " avg: " sum/nth
line = sum = 0
next
}
{ print "" }
' values.txt
awk
is an amazingly flexible tool that can do just about any type text processing you need.
Sample Output
The first few lines of output would now be:
no. 1 line: 1 sum: 1.010
no. 2 line: 2 sum: 3.020
no. 3 line: 3 sum: 6.030 avg: 2.01
no. 4 line: 1 sum: 4.010
no. 5 line: 2 sum: 9.022
no. 6 line: 3 sum: 15.041 avg: 5.01367
...