I have two files with FILE1
containing lots of lines and FILE2
with KEY VALUE parms. I need to compare FILE2
with FILE1
and if there is match the corresponding word in FILE1
should be replace with next column in FILE2
.
Example:
FILE1:
<SOME YAML CODE
-------------->
PARM1
value:PARM2
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->
FILE2:
PARM1 mmddyy
PARM2 hhmmss
PARM3 awsid
PARM4 cc
So for every match from FILE2
in FILE1
, the corresponding word in FILE1 should be replace with 2nd column in FILE2
. So the desired output should like:
<SOME YAML CODE
-------------->
**mmddyy**
value:**hhmmss**
**awsid**
somyaml_**cc**
<END OF YAML CODE
---------------->
with the help of other community i was able to run below command but it works only if the SEARCH criteria is at the start of the line
awk '
NR==FNR{k[$1]=$2;}
NR!=FNR{if($1 in k){$0=k[$1]};print}
' file2 file1
CodePudding user response:
This awk
should work for your example data.
$ awk -F"[ :_]" 'NR==FNR {array[$1]=$2; next} $1~/PARM/{sub(/PARM./,array[$1])}$2~/PARM/{sub(/PARM./,array[$2])}1' file2 file1
$ cat awk.script
#!/usr/bin/env awk -f
BEGIN {
FS="[ :_]" #Set the delimiters to use
} NR==FNR {
array[$1]=$2 #Create array from file2
next #Move to file1
} $1~/PARM/ { #If column1 matches PARM
sub(/PARM./,array[$1]) #Substitute PARM for content of array
} $2~/PARM/ { #If column 2 matches PARM
sub(/PARM./,array[$2]) #Substitute PARM for content array
} 1 #Print
Output
$ awk -f awk.script file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
awsid
somyaml_cc
<END OF YAML CODE
---------------->
CodePudding user response:
This will do what you want even if you have PARM values that are subsets of other PARM values, or PARMs that contain regexp metachars or replacement strings that contain backreferences or recursive definitions where PARM1 is mapped to PARM2 and PARM2 to PARM1. If file2 defines a mapping of one PARM to another then this just does the mappings in longest-first order. It uses GNU awk for sorted_in so we can visit the PARMs in longest-first order to handle subsets correctly - if you don't have GNU awk then do the sort of file2 outside of the awk script.
$ cat tst.awk
BEGIN {
PROCINFO["sorted_in"] = "@val_num_desc"
}
NR==FNR {
map[$1] = $2
len[$1] = length($1)
next
}
{
for ( old in len ) {
new = map[old]
head = ""
tail = $0
while ( s=index(tail,old) ) {
head = head substr($0,1,s-1) new
tail = substr(tail,s len[old])
}
$0 = head tail
}
print
}
$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
awsid
somyaml_cc
<END OF YAML CODE
---------------->
The example provided in the question isn't good though as it only covers the most basic sunny day case where all strings in file2 are unique and comprised of only alphanumeric characters. A better test uses regexp metachars in the first string, strings that are subsets of other strings, and backreference metachars in the replacement strings, e.g.:
$ head file1 file2
==> file1 <==
<SOME YAML CODE
-------------->
A.B
value:A.BX
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->
==> file2 <==
A.B mmddyy
A.BX hhmmss
PARM3 a\1b&c
PARM4 cc
$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
a\1b&c
somyaml_cc
<END OF YAML CODE
---------------->
That's still missing cases to consider, e.g. where file2 contains recursive mappings or even just one-off mappings from one "PARM" string to another:
$ head file1 file2
==> file1 <==
<SOME YAML CODE
-------------->
PARM1
value:PARM2
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->
==> file2 <==
PARM1 mmddyy
PARM2 hhmmss
PARM3 PARM4
PARM4 PARM3
$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
PARM4
somyaml_PARM4
<END OF YAML CODE
---------------->
but that's not discussed in the question so I don't know if that's the expected output or not and, if not, what the expected output WOULD be and why - update the question if you need to handle that or any of the other non-trivial cases differently.