The file I want to edit looks like this:
chr1 24809154 24809669
chr1 24546969 24547563
chr1 7932037 7932594
chr3 42012155 42012598
chr3 923035 923549
chr4 5799575 5799990
chr4 6895845 6896348
chr4 2337251 2337743
chr5 10715994 10716426
chr5 4385445 4385878
And I have another reference table file that has alternative values for the first column:
chr1 scaffold_A
chr2 scaffold_B
chr3 scaffold_C
chr4 scaffold_D
chr5 scaffold_E
How can I take the values in the reference table to rename values in the first table so the final output is:
scaffold_A 24809154 24809669
scaffold_A 24546969 24547563
scaffold_A 7932037 7932594
scaffold_C 42012155 42012598
scaffold_C 923035 923549
scaffold_D 5799575 5799990
scaffold_D 6895845 6896348
scaffold_D 2337251 2337743
scaffold_E 10715994 10716426
scaffold_E 4385445 4385878
CodePudding user response:
I think the easiest way is to write a script that will loop through the table line by line, get the first field (field1
below), then get its substituting value (subs1
below) and finally make the substitution using sed
on a copy of the table (renamed.txt
):
#!/bin/bash
cp "table.txt" "renamed.txt"
while IFS= read -r line; do
field1=$(echo "${line}" | awk '{print $1;}')
subs1=$(grep -m1 ${field1} "ref.txt" | awk '{print $2;}')
sed -i "s/${field1}/${subs1}/" "renamed.txt"
done < "table.txt"
Testing:
$ cat table.txt
chr1 24809154 24809669
chr1 24546969 24547563
chr1 7932037 7932594
chr3 42012155 42012598
chr3 923035 923549
chr4 5799575 5799990
chr4 6895845 6896348
chr4 2337251 2337743
chr5 10715994 10716426
chr5 4385445 4385878
$ cat ref.txt
chr1 scaffold_A
chr2 scaffold_B
chr3 scaffold_C
chr4 scaffold_D
chr5 scaffold_E
$ ./rename_table.sh
$ cat renamed.txt
scaffold_A 24809154 24809669
scaffold_A 24546969 24547563
scaffold_A 7932037 7932594
scaffold_C 42012155 42012598
scaffold_C 923035 923549
scaffold_D 5799575 5799990
scaffold_D 6895845 6896348
scaffold_D 2337251 2337743
scaffold_E 10715994 10716426
scaffold_E 4385445 4385878
CodePudding user response:
Using awk
$ awk 'NR==FNR {a[$1]=$2;next}{$1=a[$1]}1' reference.table file1
scaffold_A 24809154 24809669
scaffold_A 24546969 24547563
scaffold_A 7932037 7932594
scaffold_C 42012155 42012598
scaffold_C 923035 923549
scaffold_D 5799575 5799990
scaffold_D 6895845 6896348
scaffold_D 2337251 2337743
scaffold_E 10715994 10716426
scaffold_E 4385445 4385878