Let's say I have two large files. The first one is looking like this: File1:
N 4764
56.067000 50.667000 24.026000
HT1 4765
55.129000 51.012000 24.198000
HT2 4766
56.059000 50.183000 23.126000
and the second one: File2:
N NH2 -0.850000
HT1 H 0.222000
HT2 H 0.222000
I would like to replace all the N, HT1, and ..., in the first file with their matches in the second file (in second column of file2). so the outcome would be:
Outcome:
NH2 4764
56.067000 50.667000 24.026000
H 4765
55.129000 51.012000 24.198000
H 4766
56.059000 50.183000 23.126000
I am trying to do it with 'sed' but have not worked yet. Maybe awk is a better option?
*edit: My initial examples looked confusing so I changed my examples to the actual files I am dealing with. These are just three lines of my files.
CodePudding user response:
If the first field of both files are sorted then a simple join
command will give you the expected result:
join -o 2.2,1.2,1.3 file1.txt file2.txt
A3 125 111
B1 132 195
C56 145 695
D3 177 1001
If not, then you can use awk
:
awk '
FNR == NR { arr[$1] = $2 OFS $3; next }
$1 in arr { print $2,arr[$1] }
' file1.txt file2.txt
CodePudding user response:
One awk
idea:
awk '
FNR==NR { a[$1]=$2; next } # 1st file: save entries in array
$1 in a { $1=a[$1] } # 2nd file: if $1 is an index in array then replace $1 with match from array
1 # print current line
' File2 File1
This generates:
NH2 4764
56.067000 50.667000 24.026000
H 4765
55.129000 51.012000 24.198000
H 4766
56.059000 50.183000 23.126000
NOTE: assumes spacing does not need to be maintained in the lines undergoing a replacement