Home > OS >  How to use the Linux from the sequence to find the relationship between the UMI and barcode
How to use the Linux from the sequence to find the relationship between the UMI and barcode

Time:10-10

Objective
Using magnetic beads probe to capture of mRNA, each magnetic beads containing a specific sequence, become UMI, according to a magnetic beads only capture a transcription of this feature, use UMI the same transcript can be read for clustering, for later assembly, the analysis of the quantitative and variable shear

Mission requirements:
According to the composition of UMI characteristic, found in the read UMI, UMI sequence and read the barcode of relationship between records,

Description:

Is UMI composed of the following characteristics:

GGAAACAGCTATGACCATGNNNNNNNNNNNNNNNNTTTTTTTT

Fixed sequence: GGAAACAGCTATGACCATG
UMI sequence: NNN is a random sequence of 16 bp UMI

Looking for strategists need to simultaneously satisfy several conditions:
1) traverse fastq, find a fixed sequence
2) after the interval and bp, find 3 oligo dT series
(note that considering the reverse complementary sequence)
Meet the two conditions can establish the relationship between them:
UMI sequences 1.
2. To find the corresponding barcode number
3. Establish UMI and barcode corresponding relation table
Data:
/hwfssz5 ST_BIGDATA/USER/xujunhao/project/course/result/split_read. 1 _rename. Fq. Gz
/hwfssz5 ST_BIGDATA/USER/xujunhao/project/course/result/split_read. 2 _rename. Fq. Gz

Asking a heavyweight said should do, thank you'd better have a code
  • Related