I have a txt file that shows the average sunspot data for each month of the year between 1749 and 2005.
(* Month: 1749 01 *) 58
(* Month: 1749 02 *) 63
(* Month: 1749 03 *) 70
(* Month: 1749 04 *) 56
(* Month: 1749 05 *) 85
(* Month: 1749 06 *) 84
(* Month: 1749 07 *) 95
(* Month: 1749 08 *) 66
(* Month: 1749 09 *) 76
(* Month: 1749 10 *) 76
(* Month: 1749 11 *) 159
(* Month: 1749 12 *) 85
(* Month: 1750 01 *) 73
(* Month: 1750 02 *) 76
(* Month: 1750 03 *) 89
(* Month: 1750 04 *) 88
Etc.
I need to average the 12 months for each year. So 1749 should equal 81. Averaging the $6 row with awk seems to be simple.
awk ' {sum = $6}
END { print sum/ NR } ' sunspot.txt
However, I don't know where to start as far as using control structures in Awk to incrementally average each of the 12 numbers for the years between 1749 and 2005.
CodePudding user response:
Here's one way:
awk '{a[$3] = $6; b[$3] = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n
Below shows first averaging by months, then by years, for illustration. This is using awk's built-in arrays capabilities - where the "a" array stores the summation, and the "b" keeps an increment count, which is used at end for division of the sum to compute the average.
$ awk '{a[$4] = $6; b[$4] = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n
01 65.5
02 69.5
03 79.5
04 72
05 85
06 84
07 95
08 66
09 76
10 76
11 159
12 85
$ awk '{a[$3] = $6; b[$3] = 1} END{for (i in a) print i, a[i]/b[i]}' years.txt | sort -n
1749 81.0833
1750 81.5