Thanks to @EdMorton, I can unique an array in awk this way:
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i ) {
if ( !seen[array[i]] ) {
unique[ j] = array[i]
}
}
# print out the result
for (i=1; i in unique; i ) {
print unique[i]
}
# results in:
# a
# b
# c
# d
# e
}
What I don't understand, though, is this ( !seen[array[i]] )
condition with an increment:
- I do understand that we collect unique indices in the
seen
array; - So, we check if our temp array
seen
already has an indexarray[i]
(and add it to unique, if it hasn't); - But the increment after the index is the thing I still can't get :) (despite the detailed explanation provided by Ed).
So, my question is the following: can we somehow re-write this conditional in a more elaborate way? May be this would really help to finalise my take on it :)
CodePudding user response:
Hope this is clearer but idk - best I can say is it's more elaborate as requested!
$ cat tst.awk
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i ) {
val = array[i]
count[val] = count[val] 1
if ( count[val] == 1 ) {
is_first_time_val_seen = 1
}
else {
is_first_time_val_seen = 0
}
if ( is_first_time_val_seen ) {
unique[ j] = val
}
}
# print out the result
for (i=1; i in unique; i ) {
print unique[i]
}
}
$ awk -f tst.awk
a
b
c
d
e
CodePudding user response:
Another approach is to put array
's values into a new associative array as keys. That will enforce uniqueness:
BEGIN {
# it's helpful to use the return value from `split`
n = split("a b c d e a b", array)
# use the element value as a key.
# It doesn't really matter what the right-hand side of the assignment is.
for (i = 1; i <= n; i ) uniq[array[i]] = i
# now, it's easy to iterate over the unique keys
for (elem in uniq) print elem
}
outputs in no guaranteed order:
a
b
c
d
e
if you're using GNU awk, use PROCINFO["sorted_in"]
to control sorting of the array traversal