Home > Back-end >  How to rename values in column of a tab-delimited table using another reference table in Bash
How to rename values in column of a tab-delimited table using another reference table in Bash

Time:09-09

The file I want to edit looks like this:

chr1    24809154    24809669
chr1    24546969    24547563
chr1    7932037 7932594
chr3    42012155    42012598
chr3    923035  923549
chr4    5799575 5799990
chr4    6895845 6896348
chr4    2337251 2337743
chr5    10715994    10716426
chr5    4385445 4385878

And I have another reference table file that has alternative values for the first column:

chr1    scaffold_A  
chr2    scaffold_B  
chr3    scaffold_C  
chr4    scaffold_D  
chr5    scaffold_E  

How can I take the values in the reference table to rename values in the first table so the final output is:

scaffold_A  24809154    24809669
scaffold_A  24546969    24547563
scaffold_A  7932037 7932594
scaffold_C  42012155    42012598
scaffold_C  923035  923549
scaffold_D  5799575 5799990
scaffold_D  6895845 6896348
scaffold_D  2337251 2337743
scaffold_E  10715994    10716426
scaffold_E  4385445 4385878

CodePudding user response:

I think the easiest way is to write a script that will loop through the table line by line, get the first field (field1 below), then get its substituting value (subs1 below) and finally make the substitution using sed on a copy of the table (renamed.txt):

#!/bin/bash

cp "table.txt" "renamed.txt"

while IFS= read -r line; do

  field1=$(echo "${line}" | awk '{print $1;}')
  subs1=$(grep -m1 ${field1} "ref.txt" | awk '{print $2;}')

  sed -i "s/${field1}/${subs1}/" "renamed.txt"

done < "table.txt"

Testing:

$ cat table.txt 
chr1    24809154    24809669
chr1    24546969    24547563
chr1    7932037 7932594
chr3    42012155    42012598
chr3    923035  923549
chr4    5799575 5799990
chr4    6895845 6896348
chr4    2337251 2337743
chr5    10715994    10716426
chr5    4385445 4385878

$ cat ref.txt 
chr1    scaffold_A  
chr2    scaffold_B  
chr3    scaffold_C  
chr4    scaffold_D  
chr5    scaffold_E  

$ ./rename_table.sh 
$ cat renamed.txt 
scaffold_A    24809154    24809669
scaffold_A    24546969    24547563
scaffold_A    7932037 7932594
scaffold_C    42012155    42012598
scaffold_C    923035  923549
scaffold_D    5799575 5799990
scaffold_D    6895845 6896348
scaffold_D    2337251 2337743
scaffold_E    10715994    10716426
scaffold_E    4385445 4385878

CodePudding user response:

Using awk

$ awk 'NR==FNR {a[$1]=$2;next}{$1=a[$1]}1' reference.table file1
scaffold_A 24809154 24809669
scaffold_A 24546969 24547563
scaffold_A 7932037 7932594
scaffold_C 42012155 42012598
scaffold_C 923035 923549
scaffold_D 5799575 5799990
scaffold_D 6895845 6896348
scaffold_D 2337251 2337743
scaffold_E 10715994 10716426
scaffold_E 4385445 4385878
  • Related