Find a string in one file and replace it with its match in another file-CodePudding

Let's say I have two large files. The first one is looking like this: File1:

 N      4764 
    56.067000   50.667000   24.026000
 HT1    4765 
    55.129000   51.012000   24.198000
 HT2    4766 
    56.059000   50.183000   23.126000

and the second one: File2:

N    NH2     -0.850000
HT1  H        0.222000
HT2  H        0.222000

I would like to replace all the N, HT1, and ..., in the first file with their matches in the second file (in second column of file2). so the outcome would be:

Outcome:

 NH2    4764 
    56.067000   50.667000   24.026000
 H      4765 
    55.129000   51.012000   24.198000
 H      4766 
    56.059000   50.183000   23.126000

I am trying to do it with 'sed' but have not worked yet. Maybe awk is a better option?

*edit: My initial examples looked confusing so I changed my examples to the actual files I am dealing with. These are just three lines of my files.

CodePudding user response：

If the first field of both files are sorted then a simple join command will give you the expected result:

join -o 2.2,1.2,1.3 file1.txt file2.txt

A3 125 111
B1 132 195
C56 145 695
D3 177 1001

If not, then you can use awk:

awk '
    FNR == NR { arr[$1] = $2 OFS $3; next }
    $1 in arr { print $2,arr[$1] }
' file1.txt file2.txt

CodePudding user response：

One awk idea:

awk '
FNR==NR { a[$1]=$2; next }      # 1st file: save entries in array
$1 in a { $1=a[$1] }            # 2nd file: if $1 is an index in array then replace $1 with match from array
1                               # print current line
' File2 File1

This generates:

NH2 4764
    56.067000   50.667000   24.026000
H 4765
    55.129000   51.012000   24.198000
H 4766
    56.059000   50.183000   23.126000

NOTE: assumes spacing does not need to be maintained in the lines undergoing a replacement