Home > Back-end >  edit the ID column to add chr to a number
edit the ID column to add chr to a number

Time:04-29

I have a vcf file. It looks like this: It has a vcf header followed by genotype information. I want to add chr to third column. Like now it looks like:

21 9825796 21_9825796_C_T_b37

I want to add chr in front of third column so now it should look like:

21 9825796 chr21_9825796_C_T_b37

enter image description here I used this command:

awk '{if($0 !~ /^#/) print "chr"$3; else print $3}' chr21_annotate.vcf > chr21_annotate_38_impute.vcf

But I am not able to get the desired output. Can anyone help

CodePudding user response:

mawk '/^[^#]/*sub(/^/,"chr",$3)' test.vcf

# comment 1 
21 9825796 21_9825796_C_T_b37
43 82852851 43_82852851_C_T_b37

===before ^ ========after v ========

21 9825796 chr21_9825796_C_T_b37
43 82852851 chr43_82852851_C_T_b37

If you wanna try exotic syntax, then

mawk ' ($3="chr"$3)</^[^#]/'           

# comment 1 
21 9825796 21_9825796_C_T_b37
43 82852851 43_82852851_C_T_b37

===before ^ ========after v ========

21 9825796 chr21_9825796_C_T_b37
43 82852851 chr43_82852851_C_T_b37

CodePudding user response:

Assuming you really do have lines that start with # somewhere in your input that you don't want to change (per your code) and you don't want to change the white space between fields (per the image you posted) and you want to do it robustly so it works even if earlier fields could contain the same strings as $3 and you want to do it portably then... this will do what you want using any POSIX sed (for the [:space:] character class):

$ sed 's/^[^#][^[:space:]]*[[:space:]]*[^[:space:]]*[[:space:]]*/&chr/' file
21 9825796 chr21_9825796_C_T_b37

If you don't care about changing white space then just do this with any awk:

$ awk '!/^#/{$3="chr"$3} 1' file
21 9825796 chr21_9825796_C_T_b37
  • Related