Search the first and second variable (with spaces and special character like $) using awk-CodePudding

I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.

inputfile.txt:

IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs

BigFile.txt:

\\IFRA-SCN-01001B.brz.com\ABC PTR,[email protected]
\\IFRA-SCN-01001B.brz.com\ABC PTR,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,[email protected]
\\IFRA-SCN-01001B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01001B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]

it is working if i use the actual string but not if assigned to a variable.

[root@brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s 2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]


while read -r zz; do
        var1=`echo $zz | print '{print $1}'`
        var2=`echo $zz | print '{print $2}'`
        awk '/$var2/{ok=1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
        awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
        fi
done < inputfile.txt

any idea what am i missing?

awk '/$var2/{ok=1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING

CodePudding user response：

I see a number of problems here. First, where you split the fields from inputfile.txt with

while read -r zz; do
    var1=`echo $zz | print '{print $1}'`
    var2=`echo $zz | print '{print $2}'`

When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1 will be set correctly, but var2 will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read do the splitting for you:

while read -r var1 var2

...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:

while read -r var1 var2 ignoredstuff

See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

As for the awk commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0 to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v, is a lot better, but you still have to fix a couple of things.

In the -v a=$var1 b=$var2 part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" b="$var2". You should pretty much always double-quote variable references to prevent problems like this.

Also, the way you use those a and b variables in the awk command isn't right. $ in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2 gets the second field, and $(x 2) gets the x-plus-second field). Also, in a /.../ pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a and /$b/ is $0~aand$0~b(note that~` is awk's regex match operator).

So the command should be something like this:

awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s 2 && $0~a{print $0}' BigFile.txt

Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the . characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $ in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b) instead:

awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s 2 && index($0,a){print $0}' BigFile.txt

I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.

Finally, I have to ask what's up with all the ok and s stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep search be simpler?

CodePudding user response：

Your code is:

while read -r zz; do
        var1=`echo $zz | print '{print $1}'`
        var2=`echo $zz | print '{print $2}'`
        awk '/$var2/{ok=awk1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
        awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
        fi
done < inputfile.txt

I won't address the logic of the code (eg. you should probably be using match($0,"...") instead of /.../, and I don't know what the test NR<=s 2 is for) but here are some syntax and efficiency issues:

You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just: read -r var1 var2 or read -r var1 var2 junk
print is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}', etc)? But just use simple read instead.
Single-quotes prevent variable expansion so, inside the script argument passed to awk, /$var/ will literally look for dollar, v, a, r. Pass variables using awk's -v option as you do in the second awk line.
Each variable passed to awk needs a separate -v option.
awk does not use $name to reference variable values, simply name. To use a variable as a regex, just use it in the right place: eg. $0 ~ name.

So:

while read -r var1 var2 junk; do
    # quote variables to prevent globbing, word-splitting, etc
    awk -v a="$var1" -v "$var2" '
        $0 ~ var2 { ok=1; s=NR }
        ok && NR<=s 2 && $0 ~ var1 ; # print is default action
    ' BigFile.txt
done <inputfile.txt

Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.