The title says it all:
When I construct an array in awk
using natural (one-based) numbers as index by appending elements, can I use for (VAR in ARRAY)
to get the array elements in the correct order (i.e. the order the elements were added)?
As arrays in awk
are associative, the question is whether the iteration would use the order that foreach (@array)
would give in Perl, or more likely what while (my ($k, $v) = each %hash)
would give (also in Perl).
The latter may be any order.
The gawk
manual states:
An important aspect to remember about arrays is that array subscripts are always strings. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting.
I know that I can construct a for
loop counting up the index in awk
, but still I wonder.
CodePudding user response:
What is in the correct order
? Numerically ascending? Numerically descending? First-in? Alphabetically ascending (since array indices are always strings, not numbers)? Something else?
The point is there is no generally "correct order" so there is no specific order that would be "correct" across all scripts for awk to visit the indices in and so it's up to every awk to simply visit the indices as efficiently as possible.
If you have a set of numeric indices starting at, say, 1 but you don't know the max value and you want to visit them in ascending order, for example, you can do:
for (i=1; i in array; i ) {
print array[i]
}
otherwise write your own way of tracking order, e.g. to get them in first-in order (assuming all unique indices):
order[ numIndices] = $1
array[$1] = $2
...
for (o=1; o<=numIndices; o ) {
i = order[o]
print array[i]
}
or use GNU awk for PROCINFO["sorted_in"] but you'd still have to write your own way of tracking first-in order even with that.
CodePudding user response:
When using natural numbers as array index, will
for (VAR in ARRAY)
iterate in the correct order?
GNU AWK
manual's Scanning an Array says about such for
The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within
awk
and in standardawk
cannot be controlled or changed. This can lead to problems if new elements are added to array by statements in the loop body; it is not predictable whether thefor
loop will reach them. Similarly, changing var inside the loop may produce strange results. It is best to avoid such things.As a point of information,
gawk
sets up the list of elements to be iterated over before the loop starts, and does not change it. But not allawk
versions do so.
So if question pertains to GNU AWK
answer is that there is some order imposed, which as explained in Controlling Scanning might be changed (selected), however if it pertains to other awk
versions you must not assume certain order of array traversal when using for
without care.
CodePudding user response:
Both GNU awk and POSIX awk state that default array traversal is in arbitrary order.
From OpenGroup POSIX awk documentation:
for (variable in array)
which shall iterate, assigning each index of array to variable
in an unspecified order.
From GNU awk manual:
By default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next.
That said, some versions of GNU actually seem to traverse in insertion order.
Here is GNU awk:
echo 'a
b
c
d' | gawk '{arr[$1]} END{for (e in arr) print e}'
Prints:
a
b
c
d
VS BSD awk:
echo 'a
b
c
d' | awk '{arr[$1]} END{for (e in arr) print e}'
Prints:
d
a
b
c
Even if your awk seems to use insertion order, there is no guarantee that it will in all circumstances and you should not count on that behavior.