I have a dataset where i need to search for the 2 variables in it. Both vars should be present, otherwise ignore them.
inputfile.txt:
IFRA-SCN-01001B.brz.com Tower Sales
IFRA-SCN-01001B.brz.com Z$
IFRA-SCN-01001B.brz.com Pre-code$
IFRA-SCN-01001B.brz.com Technical Stuff
IFRA-SCN-01001B.brz.com expired$
IFRA-SCN-01001B.brz.com AA$
IFRA-SCN-01002B.brz.com Build Docs
IFRA-SCN-01002B.brz.com Build Docs
BigFile.txt:
\\IFRA-SCN-01001B.brz.com\ABC PTR,[email protected]
\\IFRA-SCN-01001B.brz.com\ABC PTR,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare\DOC TRIGGER,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare,[email protected]
\\IFRA-SCN-01001B.brz.com\bitshare\PFM FRAUD,[email protected]
\\IFRA-SCN-01001B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01001B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
it is working if i use the actual string but not if assigned to a variable.
[root@brzmgmt]$ awk '/Build Docs/{ok=1;s=NR}ok && NR<=s 2 && /IFRA-SCN-01002B.brz.com/{print $0}' BigFile.txt
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
\\IFRA-SCN-01002B.brz.com\Build Docs,[email protected]
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
any idea what am i missing?
awk '/$var2/{ok=1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 -v b=$var2 '/$b/{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
CodePudding user response:
I see a number of problems here. First, where you split the fields from inputfile.txt with
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
When the line is something like "IFRA-SCN-01002B.brz.com Build Docs", var1
will be set correctly, but var2
will only get "Build", not "Build Docs". I assume you want the latter? If so, I'd let read
do the splitting for you:
while read -r var1 var2
...which will automatically include any "extra fields" (e.g. "Docs") in the last variable. If you don't want the full remainder of the line, just add an extra variable to hold anything beyond the second field:
while read -r var1 var2 ignoredstuff
See BashFAQ #1: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
As for the awk
commands, the first one doesn't work because the shell doesn't expand variables inside single-quotes. You could switch to double-quotes, but then you'd have to escape $0
to keep the shell from expanding that, and you'd also have to worry about the search strings possibly including awk syntax, and it's generally a mess. The second method, with -v
, is a lot better, but you still have to fix a couple of things.
In the -v a=$var1 b=$var2
part, you should double-quote the variables so the shell doesn't split them if they contain spaces (like "Build Docs"): -v a="$var1" b="$var2"
. You should pretty much always double-quote variable references to prevent problems like this.
Also, the way you use those a
and b
variables in the awk
command isn't right. $
in awk doesn't mean "substitute a variable" like it does in shell, it generally means "get a field by number" (e.g. $2
gets the second field, and $(x 2)
gets the x
-plus-second field). Also, in a /.../
pattern, variables (and field references) don't get substituted anyway. So what you probably want instead of /$a
and /$b/ is
$0~aand
$0~b(note that
~` is awk's regex match operator).
So the command should be something like this:
awk -v a="$var1" -v b="$var2" '$0~b{ok=1;s=NR}ok && NR<=s 2 && $0~a{print $0}' BigFile.txt
Except... you might not want that, because it treats the strings as regular expressions rather than plain strings. So the .
characters in "IFRA-SCN-01001B.brz.com" will match any single character, and the $
in "Pre-code$" will be treated as an end-of-string anchor rather than a literal character. If you just want them matched as literal strings, use e.g. index($0,b)
instead:
awk -v a="$var1" -v b="$var2" 'index($0,b){ok=1;s=NR}ok && NR<=s 2 && index($0,a){print $0}' BigFile.txt
I'd also recommend running your scripts through shellcheck.net to catch common mistakes and bad practices.
Finally, I have to ask what's up with all the ok
and s
stuff. That looks like it's going to insert some weird inter-record dependencies that don't make any sense. Also, if the fields are always going to be in that same order, would a grep
search be simpler?
CodePudding user response:
Your code is:
while read -r zz; do
var1=`echo $zz | print '{print $1}'`
var2=`echo $zz | print '{print $2}'`
awk '/$var2/{ok=awk1;s=NR}ok && NR<=s 2 && /$va1/{print $0}' BigFile.txt <--NOT_WORKING
awk -v a=$var1 b=$var2 '/$b//{ok=1;s=NR}ok && NR<=s 2 && /$a/{print $0}' BigFile.txt <--NOT_WORKING
fi
done < inputfile.txt
I won't address the logic of the code (eg. you should probably be using match($0,"...")
instead of /.../
, and I don't know what the test NR<=s 2
is for) but here are some syntax and efficiency issues:
- You appear to want to read a line of whitespace-delimited text into two variables. This is more simply done with just:
read -r var1 var2
orread -r var1 var2 junk
print
is not a standard shell command. Perhaps this is meant to be an awk script (awk '{print $1}'
, etc)? But just use simple read instead.- Single-quotes prevent variable expansion so, inside the script argument passed to awk,
/$var/
will literally look for dollar, v, a, r. Pass variables using awk's-v
option as you do in the second awk line. - Each variable passed to awk needs a separate
-v
option. - awk does not use
$name
to reference variable values, simplyname
. To use a variable as a regex, just use it in the right place: eg.$0 ~ name
.
So:
while read -r var1 var2 junk; do
# quote variables to prevent globbing, word-splitting, etc
awk -v a="$var1" -v "$var2" '
$0 ~ var2 { ok=1; s=NR }
ok && NR<=s 2 && $0 ~ var1 ; # print is default action
' BigFile.txt
done <inputfile.txt
Note that the more var1/var2 you want to check, the longer the runtime ( O(mn) : m sets of var1/var2 and n lines of input to check ). There may be more efficient algorithms if the problem is better-specified.