I have a .tsv file where I want to 1/ substitute the values of the 6th column onwards until the last column, 2/ exclude the 4th and 5th columns, 3/ rearrange, i.e. swap the 3rd column for the 2nd column and lastly include a column between the 2nd and 3rd columns where the value is the number 0 for all observation (lines).
1/ the values from the 6th column onwards can either be 0,1 or 2. What I want to do is: If the number is 0, substitute by 1\t1 (so it will create another column). If the number is 1, substitute by 1\t2 and if it is 2, substitute by 2/t2.
I've managed to do this by generating a new .tsv with only the values of the 6th column onwards, an then using the commands in awk below:
awk '{gsub(/2/,"2\t2");}1' file.txt > file1; awk '{gsub(/1/,"1\t2");}1' file1 > file2; awk '{gsub(/0/,"1\t1");}1' file2 > file3
After that I would work on the 2/ and 3/ problems. I would have a .tsv file generate with awk from the original .tsv file already excluding the 4th and 5th columns and swapping the 3rd and 2nd columns with this command:
awk -v OFS="\t" '{ print $1, $3, $2 }' original_tsv_file.tsv > reordered_tsv_file.tsv
Then, the next step is to include the number 0 as the 3rd column of the new generated reordered_tsv_file.tsv. I used the command below:
sed -i -e 's/^/0\t/' reordered_tsv_file.tsv
This would include the number 0 in front of every new line and now I would just do the awk command from the step before to reorder the columns like this:
awk -v OFS="\t" '{ print $2, $3, $1, $4 }' reordered_tsv_file.tsv > final_columns_to_be_merged_with_file3.tsv
And finally in order to get my desired file I would just paste the final_columns_to_be_merged_with_file3.tsv and the file3 with the command:
paste -d'\t' final_columns_to_be_merged_with_file3.tsv file3 > final_file.tsv
Below is and example of the original_tsv_file.tsv:
chr15 101152646 chr15:101152646:A:G A G 1 1 0 0
chr15 101152650 chr15:101152650:A:C A C 1 1 0 0
chr15 101152872 chr15:101152872:G:A G A 1 1 0 0
chr15 101152923 chr15:101152923:G:A G A 1 1 0 0
chr15 101152954 chr15:101152954:C:T C T 0 2 0 0
chr15 101153197 chr15:101153197:G:C G C 0 2 0 0
And the final_file.tsv:
chr15 chr15:101152646:A:G 0 101152646 1 2 1 2 1 1 1 1
chr15 chr15:101152650:A:C 0 101152650 1 2 1 2 1 1 1 1
chr15 chr15:101152872:G:A 0 101152872 1 2 1 2 1 1 1 1
chr15 chr15:101152923:G:A 0 101152923 1 2 1 2 1 1 1 1
chr15 chr15:101152954:C:T 0 101152954 1 1 2 2 1 1 1 1
chr15 chr15:101153197:G:C 0 101153197 1 1 2 2 1 1 1 1
What I want is a solution or advice on how can I make this process in a one piece of coding instead of using 5 different commands and tons of generated files in order to get the final result. All the commands above I managed to get it and modifying from the internet, but the I don't; have the knowledge yet to improve this.
Thanks for any help.
CodePudding user response:
you can try it
awk 'BEGIN{OFS="\t"}
{printf "%s%s%s%s0%s%s", $1, OFS, $3, OFS, OFS, $2;
for(i=6;i<=NF; i){
if($i==0) printf OFS"1"OFS"1";
else if($i==1) printf OFS"1"OFS"2";
else printf OFS"2"OFS"2"
} printf ORS}
' original_tsv_file.tsv > final_file.tsv
you get in final_file.tsv
chr15 chr15:101152646:A:G 0 101152646 1 2 1 2 1 1 1 1
chr15 chr15:101152650:A:C 0 101152650 1 2 1 2 1 1 1 1
chr15 chr15:101152872:G:A 0 101152872 1 2 1 2 1 1 1 1
chr15 chr15:101152923:G:A 0 101152923 1 2 1 2 1 1 1 1
chr15 chr15:101152954:C:T 0 101152954 1 1 2 2 1 1 1 1
chr15 chr15:101153197:G:C 0 101153197 1 1 2 2 1 1 1 1
CodePudding user response:
Assumptions:
- OP wants all output columns delimited by a tab (
\t
) (expected output appears to be fixed width, but OP's code mentionsOFS="\t"
) - columns 6 through EOL contain only one of 3 values:
0
or1
or2
One awk
idea:
awk '
BEGIN { OFS="\t" }
{ outline= $1 OFS $3 OFS "0" OFS $2
for (i=6;i<=NF;i )
if ($i==0) outline=outline OFS 1 OFS 1
else if ($i==1) outline=outline OFS 1 OFS 2
else outline=outline OFS 2 OFS 2
print outline
}
' file.txt
A slightly convoluted idea using a couple ternary operations:
awk '
BEGIN { OFS="\t" }
{ outline= $1 OFS $3 OFS "0" OFS $2
for (i=6;i<=NF;i )
outline=outline OFS \
(($i==0) ? 1 OFS 1 : (($i==1) ? 1 OFS 2 : 2 OFS 2))
print outline
}
' file.txt
Both of these generate:
chr15 chr15:101152646:A:G 0 101152646 1 2 1 2 1 1 1 1
chr15 chr15:101152650:A:C 0 101152650 1 2 1 2 1 1 1 1
chr15 chr15:101152872:G:A 0 101152872 1 2 1 2 1 1 1 1
chr15 chr15:101152923:G:A 0 101152923 1 2 1 2 1 1 1 1
chr15 chr15:101152954:C:T 0 101152954 1 1 2 2 1 1 1 1
chr15 chr15:101153197:G:C 0 101153197 1 1 2 2 1 1 1 1