Home > Software engineering >  Using awk to replace part of a line with the output of a program
Using awk to replace part of a line with the output of a program

Time:01-05

I've got a file with several columns, like so:

13:46:48 1.2.3.4:57 user1
13:46:49 5.6.7.8:58 user2
13:48:07 9.10.11.12:59 user3

I'd like to transform one of the columns by passing it as input to a program:

echo "1.2.3.4:57" | transformExternalIp
10.0.0.4:57

I wrote a small bit of awk to do this:

awk '{ ("echo " $2 " | transformExternalIp") | getline output; $2=output; print}'  

But what I got surprised me. Initially, it looked like it was working as expected, but then I started to see weird repeated values. In order to debug, I removed my fancy "transformExternalIp" program in case it was the problem and replaced it with echo and cat, which means literally nothing should change:

awk '{ ("echo " $2 " | cat") | getline output; print $2 " - " output}'   connections.txt

For the first thousand lines or so, the left and right sides matched, but then after that, the right side frequently stopped changing:

1.2.3.4:57 - 1.2.3.4:57
2.2.3.4:12 - 2.2.3.4:12
3.2.3.4:24 - 3.2.3.4:24
# .... (okay for a long while)
120.120.3.4:57 - 120.120.3.4:57
121.120.3.4:25 - 120.120.3.4:57
122.120.3.4:100 - 120.120.3.4:57
123.120.3.4:76 - 120.120.3.4:57

What the heck have I done wrong? I'm guessing that I'm misunderstanding something about awk.

CodePudding user response:

Close the command after each invocation to insure a new copy of the command is run for the next set of input, eg:

awk '{ ("echo " $2 " | transformExternalIp") | getline output
       close("echo " $2 " | transformExternalIp")
       $2=output
       print
     }'

# or, to reduce issues from making a typo:

awk '{ cmd="echo " $2 " | transformExternalIp"
       (cmd) | getline output
       close(cmd)
       $2=output
       print
     }'

For more details see this and this.


During my testing with a dummy script (echo $RANDOM; sleep .1) I could generate similar results as OP ... some good/expected lines and then a bunch of duplicates.

I noticed that as soon as the duplicates started occuring, the dummy script wasn't actually being called any more and instead awk was treating the system call as a static result (ie, kept re-using the value from the last 'good' call); it was quite noticeable because the sleep .1 was no longer being called so the output from the awk script sped up significantly.

Can't say that I understand 100% what's happening under the covers ... perhaps an issue with how the script (my dummy script; OP's transforExternalIp) behaves with multiple lines of input when expecting one line of input ... an issue with a limit on the number of open/active process handles ... shrug

CodePudding user response:

("echo" $2" | cat") creates a fork almost every time that you use it.

Then, when the above instruction reaches some kind of fork limit, the output variable isn't updated by getline anymore; that's what's happening here.

If you're using GNU awk then you can fix the issue with a Coprocess:

awk '
    BEGIN { cmd = "cat" }
    {
        print $2 |& cmd
        cmd |& getline output
        print $2 " - " output
    }
' connections.txt
  • Related