I've been recently trying to do the following in awk - we have two files (F1.txt F2.txt.gz). While streaming from the second one, I want to replace all occurrences of entries from f1.txt with its substrings. I came to this point:
zcat F2.txt.gz |
awk 'NR==FNR {a[$1]; next}
{for (i in a)
$0=gsub(i, substr(i, 0, 2), $0) #this does not work of course
}
{print $0}
' F1.txt -
Was wondering how to do this properly in Awk. Thanks!
CodePudding user response:
try to change
$0=gsub(i, substr(i, 0, 2), $0)
into
gsub(i, substr(i, 0, 2))
The return value of the gsub() function is the number of successful replacements instead of the string after the replacement.
CodePudding user response:
$0=gsub(i, substr(i, 0, 2), $0) #this does not work of course
GNU AWK's function gsub
does alter value of 3rd argument (thus it must be assignable) and does return number of substitutions made. You should not care about return value if you just want altered value.
Consider following simple example, let file1.txt
content be
a x
b y
c z
and file2.txt
content be
quick fox jumped over lazy dog
then
awk 'FNR==NR{arr[$1]=$2;next}{for(i in arr){gsub(i,arr[i],$0)};print}' file1.txt file2.txt
gives output
quizk fox jumped over lxzy dog
be warned that if there is any chain in your replacement
a b
b c
then output becomes dependent on array traversal order.
(tested in gawk 4.2.1)
CodePudding user response:
Please correct the assumptions if wrong.
You have two files, one includes a set of entries. If the second file has any one of these words, replace them with first two chars.
Example:
==> file1 <==
Azerbaijan
Belarus
Canada
==> file2 <==
Caspian sea is in Azerbaijan
Belarus is in Europe
Canada is in metric system.
$ awk 'NR==FNR {a[$1]; next}
{for(i=1;i<=NF;i )
if($i in a) $i=substr($i,1,2)}1' file1 file2
Caspian sea is in Az
Be is in Europe
Ca is in metric system.
note that substring index starts with 1 in awk
.