Home > Software engineering >  Ubuntu: How do I extract only specific columns from tab-delimited file if it contains a specific str
Ubuntu: How do I extract only specific columns from tab-delimited file if it contains a specific str

Time:03-18

I want to extract lines with chr6.fa from the FC305JN_s_1_eland_result.txt file. Then, I want to extract only columns 1, 2, 7, 8, and 9 from this sub-file.

grep -E "chr6.fa" FC305JN_s_1_eland_result.txt > out.txt
awk -F, '{OFS=",";print $1, $2, $7, $8, $9}' out.txt > outfile.txt
My out.txt is exactly the same as outfile.txt.

Small sample of the FC305JN_s_1_eland_result.txt file:

>FC305JN_20080525:1:15:1412:166 GTGAATCCTTATTCCGATATATATNNNN    U0  1   0   0   chrX.fa 45974622    R   ..
>FC305JN_20080525:1:15:944:72   GATGACTTCCTTAATTTTCTTTATNNNN    U0  1   0   0   chr6.fa 7200804 R   ..
>FC305JN_20080525:1:15:1049:473 GAATGGCAACACAAACAGGGCTGANNNN    R2  0   0   4
>FC305JN_20080525:1:15:1196:1959    GGGAGAAGCCTCCCCGCCTCGGCCNNNN    U2  0   0   1   chr17.fa    38386704    F   ..  17A 23T
>FC305JN_20080525:1:15:1034:505 GAAAATGTTTCAAATCAATTTCTANNNN    U0  1   0   0   chr2.fa 183305566   R   ..
>FC305JN_20080525:1:15:983:126  GGATAGAGAGTTTGCACTGAGTTGNNNN    U0  1   0   0   chrX.fa 92367529    F   ..
>FC305JN_20080525:1:15:1799:100 TTCAGCTTATTGATAAAGAAGCACNNNN    U0  1   0   0   chr6.fa 20979453    R   ..
>FC305JN_20080525:1:15:743:1028 GAATGGAATGGAATGGAAAGAAACNNNN    R1  0   33  255
>FC305JN_20080525:1:15:771:1076 GAGTTCACTAAACAAAAGAGTGTCNNNN    U2  0   0   1   chr6.fa 136877852   R   ..  7A  13G

Current output outfile.txt (Example):

>FC305JN_20080525:1:15:944:72   GATGACTTCCTTAATTTTCTTTATNNNN    U0  1   0   0   chr6.fa 7200804 R   ..
>FC305JN_20080525:1:15:1799:100 TTCAGCTTATTGATAAAGAAGCACNNNN    U0  1   0   0   chr6.fa 20979453    R   ..
>FC305JN_20080525:1:15:771:1076 GAGTTCACTAAACAAAAGAGTGTCNNNN    U2  0   0   1   chr6.fa 136877852   R   ..  7A  13G

Desired output (Example):

>FC305JN_20080525:1:15:944:72   GATGACTTCCTTAATTTTCTTTATNNNN    chr6.fa 7200804 R
>FC305JN_20080525:1:15:1799:100 TTCAGCTTATTGATAAAGAAGCACNNNN    chr6.fa 20979453    R
>FC305JN_20080525:1:15:771:1076 GAGTTCACTAAACAAAAGAGTGTCNNNN    chr6.fa 136877852   R

CodePudding user response:

In full awk:

$ awk 'BEGIN {
    FS=OFS="\t"               # set correct delimiters
}
$7~/chr6\.fa/ {               # replaces the grep part
    print $1, $2, $7, $8, $9  # output
}'  file                      # your file goes here

Output:

>FC305JN_20080525:1:15:944:72   GATGACTTCCTTAATTTTCTTTATNNNN    chr6.fa 7200804 R
>FC305JN_20080525:1:15:1799:100 TTCAGCTTATTGATAAAGAAGCACNNNN    chr6.fa 20979453        R
>FC305JN_20080525:1:15:771:1076 GAGTTCACTAAACAAAAGAGTGTCNNNN    chr6.fa 136877852       R

CodePudding user response:

Simplifying your code (with code borrowed from Extract column using grep)

grep -E "chr6.fa" FC305JN_s_1_eland_result.txt > out.txt
awk '{print $1, "\t", $2, "\t", $7, "\t", $8, "\t", $9}' out.txt > outfile.txt

produces output:

FC305JN_20080525:1:15:944:72     GATGACTTCCTTAATTTTCTTTATNNNN    chr6.fa     7200804     R
FC305JN_20080525:1:15:1799:100   TTCAGCTTATTGATAAAGAAGCACNNNN    chr6.fa     20979453    R
FC305JN_20080525:1:15:771:1076   GAGTTCACTAAACAAAAGAGTGTCNNNN    chr6.fa     136877852   R
  • Related