How to average a repeating interval of a row with Awk/Bash-CodePudding

I have a txt file that shows the average sunspot data for each month of the year between 1749 and 2005.

 (* Month: 1749 01 *) 58
 (* Month: 1749 02 *) 63
 (* Month: 1749 03 *) 70
 (* Month: 1749 04 *) 56
 (* Month: 1749 05 *) 85
 (* Month: 1749 06 *) 84
 (* Month: 1749 07 *) 95
 (* Month: 1749 08 *) 66
 (* Month: 1749 09 *) 76
 (* Month: 1749 10 *) 76
 (* Month: 1749 11 *) 159
 (* Month: 1749 12 *) 85
 (* Month: 1750 01 *) 73
 (* Month: 1750 02 *) 76
 (* Month: 1750 03 *) 89
 (* Month: 1750 04 *) 88
 Etc.

I need to average the 12 months for each year. So 1749 should equal 81. Averaging the $6 row with awk seems to be simple.

awk ' {sum  = $6} 
END { print sum/ NR } ' sunspot.txt

However, I don't know where to start as far as using control structures in Awk to incrementally average each of the 12 numbers for the years between 1749 and 2005.

CodePudding user response：

Here's one way:

awk '{a[$3]  = $6; b[$3]  = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n

Below shows first averaging by months, then by years, for illustration. This is using awk's built-in arrays capabilities - where the "a" array stores the summation, and the "b" keeps an increment count, which is used at end for division of the sum to compute the average.

$ awk '{a[$4]  = $6; b[$4]  = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n
01 65.5
02 69.5
03 79.5
04 72
05 85
06 84
07 95
08 66
09 76
10 76
11 159
12 85

$ awk '{a[$3]  = $6; b[$3]  = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n
1749 81.0833
1750 81.5