Home » Transcriptome Annotation

Transcriptome Annotation

Whole transcriptome analysis using next-generation sequencing offers the possibility of interrogating genes and their expression without knowledge of their underlying genomes. Despite the unprecedented levels of sensitivity and accuracy a number of challenges still exist. A transfrag is a reconstructed transcript that can be traced to a genomic locus. It has transcriptional activity but an unknown transcript structure. Correct assembly of transfrags is compounded by heterogeneous expression levels, sequence bias and alternating splicing. Furthermore, the methods for generating RAN-seq data and their subsequent assembly are naïve to non-coding RNAs which eventually frustrate downstream annotation efforts.

There is no general consensus as to the assembly quality matrices, thus additional robust heuristic methods such as blast are applied to identify bona fide transfags. To address these problems, we propose a bioinformatics work-flow with custom Perl scripts that integrates a reference free assembly algorithm and support vector machine analysis to flag chimeric, non-coding and truncated transfrags as potential noise. We enhance selection by aligning the reconstructed transfrags to a cohort of well annotated proteins from Swiss-Prot. We are implementing this approach on apple scab (venturia inaequalis) RNA-seq data and examining its validity on a Neurospora crassa test dataset from Short Read Archive.

In order to facilitate the gene discovery and annotation process of the apples scab genome, we generated 31 million solexa paired end reads from a host-free culture of apple cab. Read pre-processing was achieved through a robust collection of home-brewed Perl scripts. We have implemented a number of merging approaches over a narrow range of k-mers post-assembly with oases to achieve higher transcript turnover with the N. crassa test data. We are currently selecting bona fide V. inaequalis transfrags.

  Copyright ©2011 Agri Genomics, All rights reserved.