Home > Software engineering >  backreferencing in awk gensub with conditional branching
backreferencing in awk gensub with conditional branching

Time:12-20

I'm referencing to answer to: GNU awk: accessing captured groups in replacement text but whith ? Quantifier for regex matching

I would like to make if statement or ternary operator ?: or something more elegant so that if regex group that is backreferenced with \\1 returns nonempty string then, one arbitrary string (\\1 is not excluded) is inserted and if it returns empty string some other arbitrary string is inserted. My example works when capturing group returns nonempty string, but doesn't return expected branch "B" when backreference is empty. How to make conditional branching based on backreferenced values?

echo abba | awk '{ print gensub(/a(b*)?a/, "\\1"?"A":"B", "g", $0)}'

CodePudding user response:

you can do the assignment in the gensub and use the value for the ternary operator afterwards, something like this

... | awk '{ v=gensub(/a(b*)?a/, "\\1", "g", $0); print v?"A":"B"}'

CodePudding user response:

Something like this, maybe?:

$ gawk '{ print gensub(/a(.*)a/, (match($0,/a(b*)?a/)?"A":"B"), "g", $0)}' <<< abba
A

$ gawk '{ print gensub(/a(.*)a/, (match($0,/a(b*)?a/)?"A":"B"), "g", $0)}' <<< acca
B

CodePudding user response:

The expressions in any arguments you pass to any function are evaluated before the function is called so gensub(/a(b*)?a/, "\\1"?"A":"B", "g", $0) is the same as str=("\\1"?"A":"B"); gensub(/a(b*)?a/, str, "g", $0) which is the same as gensub(/a(b*)?a/, "A", "g", $0).

So you cannot do what you're apparently trying to do with a single call to any function, nor can you call gsub() twice, once with ab a and then again with aa, or similar without breaking the left-to-right, leftmost-longest order in which such a replacement function would match the regexp against the input if it existed.

It looks like you might be trying to do the following, using GNU awk for patsplit():

awk '
    n = patsplit($0,f,/ab*a/,s) {
        $0 = s[0]
        for ( i=1; i<=n; i   ) {
            $0 = $0 (f[i] ~ /ab a/ ? "A" : "B") s[i]
        }
    }
1'

or with any awk:

awk '
    {
        head = ""
        while ( match($0,/ab*a/) ) {
            str = substr($0,RSTART,RLENGTH)
            head = head substr($0,1,RSTART-1) (str ~ /ab a/ ? "A" : "B")
            $0 = substr($0,RSTART RLENGTH)
        }
        $0 = head $0
    }
1'

but without sample input/output it's a guess. FWIW given this sample input file:

$ cat file
XabbaXaaXabaX
foo
abbaabba
aabbaabba
bar
abbaaabba

the above will output:

XAXBXAX
foo
AA
BbbBbba
bar
ABbba
  • Related