Home > Software engineering >  Using awk to replace part of a line with the output of a program
Using awk to replace part of a line with the output of a program


I've got a file with several columns, like so:

13:46:48 user1
13:46:49 user2
13:48:07 user3

I'd like to transform one of the columns by passing it as input to a program:

echo "" | transformExternalIp

I wrote a small bit of awk to do this:

awk '{ ("echo " $2 " | transformExternalIp") | getline output; $2=output; print}'  

But what I got surprised me. Initially, it looked like it was working as expected, but then I started to see weird repeated values. In order to debug, I removed my fancy "transformExternalIp" program in case it was the problem and replaced it with echo and cat, which means literally nothing should change:

awk '{ ("echo " $2 " | cat") | getline output; print $2 " - " output}'   connections.txt

For the first thousand lines or so, the left and right sides matched, but then after that, the right side frequently stopped changing: - - -
# .... (okay for a long while) - - - -

What the heck have I done wrong? I'm guessing that I'm misunderstanding something about awk.

CodePudding user response:

Close the command after each invocation to insure a new copy of the command is run for the next set of input, eg:

awk '{ ("echo " $2 " | transformExternalIp") | getline output
       close("echo " $2 " | transformExternalIp")

# or, to reduce issues from making a typo:

awk '{ cmd="echo " $2 " | transformExternalIp"
       (cmd) | getline output

For more details see this and this.

During my testing with a dummy script (echo $RANDOM; sleep .1) I could generate similar results as OP ... some good/expected lines and then a bunch of duplicates.

I noticed that as soon as the duplicates started occuring, the dummy script wasn't actually being called any more and instead awk was treating the system call as a static result (ie, kept re-using the value from the last 'good' call); it was quite noticeable because the sleep .1 was no longer being called so the output from the awk script sped up significantly.

Can't say that I understand 100% what's happening under the covers ... perhaps an issue with how the script (my dummy script; OP's transforExternalIp) behaves with multiple lines of input when expecting one line of input ... an issue with a limit on the number of open/active process handles ... shrug

CodePudding user response:

("echo" $2" | cat") creates a fork almost every time that you use it.

Then, when the above instruction reaches some kind of fork limit, the output variable isn't updated by getline anymore; that's what's happening here.

If you're using GNU awk then you can fix the issue with a Coprocess:

awk '
    BEGIN { cmd = "cat" }
        print $2 |& cmd
        cmd |& getline output
        print $2 " - " output
' connections.txt
  • Related