Home > Blockchain >  File Search in LINUX
File Search in LINUX

Time:09-21

I have two files with FILE1 containing lots of lines and FILE2 with KEY VALUE parms. I need to compare FILE2 with FILE1 and if there is match the corresponding word in FILE1 should be replace with next column in FILE2.

Example:

FILE1:

<SOME YAML CODE
-------------->
PARM1
value:PARM2
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->

FILE2:

PARM1 mmddyy
PARM2 hhmmss
PARM3 awsid
PARM4 cc

So for every match from FILE2 in FILE1, the corresponding word in FILE1 should be replace with 2nd column in FILE2. So the desired output should like:

<SOME YAML CODE
-------------->
**mmddyy**
value:**hhmmss**
**awsid**
somyaml_**cc**
<END OF YAML CODE
---------------->

with the help of other community i was able to run below command but it works only if the SEARCH criteria is at the start of the line

awk '
    NR==FNR{k[$1]=$2;}
    NR!=FNR{if($1 in k){$0=k[$1]};print}
' file2 file1

CodePudding user response:

This awk should work for your example data.

$ awk -F"[ :_]" 'NR==FNR {array[$1]=$2; next} $1~/PARM/{sub(/PARM./,array[$1])}$2~/PARM/{sub(/PARM./,array[$2])}1' file2 file1
$ cat awk.script
#!/usr/bin/env awk -f

BEGIN {
    FS="[ :_]"                           #Set the delimiters to use
} NR==FNR {                              
    array[$1]=$2                         #Create array from file2
    next                                 #Move to file1
} $1~/PARM/ {                            #If column1 matches PARM
    sub(/PARM./,array[$1])               #Substitute PARM for content of array
} $2~/PARM/ {                            #If column 2 matches PARM
    sub(/PARM./,array[$2])               #Substitute PARM for content array
} 1                                      #Print

Output

$ awk -f awk.script file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
awsid
somyaml_cc
<END OF YAML CODE
---------------->

CodePudding user response:

This will do what you want even if you have PARM values that are subsets of other PARM values, or PARMs that contain regexp metachars or replacement strings that contain backreferences or recursive definitions where PARM1 is mapped to PARM2 and PARM2 to PARM1. If file2 defines a mapping of one PARM to another then this just does the mappings in longest-first order. It uses GNU awk for sorted_in so we can visit the PARMs in longest-first order to handle subsets correctly - if you don't have GNU awk then do the sort of file2 outside of the awk script.

$ cat tst.awk
BEGIN {
    PROCINFO["sorted_in"] = "@val_num_desc"
}
NR==FNR {
    map[$1] = $2
    len[$1] = length($1)
    next
}
{
    for ( old in len ) {
        new = map[old]
        head = ""
        tail = $0
        while ( s=index(tail,old) ) {
            head = head substr($0,1,s-1) new
            tail = substr(tail,s len[old])
        }
        $0 = head tail
    }
    print
}

$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
awsid
somyaml_cc
<END OF YAML CODE
---------------->

The example provided in the question isn't good though as it only covers the most basic sunny day case where all strings in file2 are unique and comprised of only alphanumeric characters. A better test uses regexp metachars in the first string, strings that are subsets of other strings, and backreference metachars in the replacement strings, e.g.:

$ head file1 file2
==> file1 <==
<SOME YAML CODE
-------------->
A.B
value:A.BX
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->

==> file2 <==
A.B mmddyy
A.BX hhmmss
PARM3 a\1b&c
PARM4 cc

$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
a\1b&c
somyaml_cc
<END OF YAML CODE
---------------->

That's still missing cases to consider, e.g. where file2 contains recursive mappings or even just one-off mappings from one "PARM" string to another:

$ head file1 file2
==> file1 <==
<SOME YAML CODE
-------------->
PARM1
value:PARM2
PARM3
somyaml_PARM4
<END OF YAML CODE
---------------->

==> file2 <==
PARM1 mmddyy
PARM2 hhmmss
PARM3 PARM4
PARM4 PARM3

$ awk -f tst.awk file2 file1
<SOME YAML CODE
-------------->
mmddyy
value:hhmmss
PARM4
somyaml_PARM4
<END OF YAML CODE
---------------->

but that's not discussed in the question so I don't know if that's the expected output or not and, if not, what the expected output WOULD be and why - update the question if you need to handle that or any of the other non-trivial cases differently.

  • Related