- Position: Senior Lecturer, Principal Investigator
- Telephone: +27-(0)21-959 3910
- Facsimile: +27-(0)21-959 2512
- Email: junaid(at)sanbi ac za
MAIN RESEARCH THEME : Next Generation Sequencing and Semantic Discovery in Biomedical Genomics
Lab members (from left): Emad Fadhal, Mahjoubeh Jalali, Michael Berry, Junaid Gamieldien, Azeez Fatai, Darlington Mapiye (graduated MSc cum laude), Fanechka Esterhuysen
Extant biomedical knowledge presents significant opportunities for disambiguating and contextualizing leads identified in NGS experiments if the challenges presented by the volume and complexity of the information can be addressed.
Our core research focuses on the ontology driven semantic integration of large amounts of biomedical information using a knowledge representation technique known as a semantic network, which structures information in the form of a network of annotated ‘nodes’ and ‘edges’. Semantic networks using these representations can be used for automated logical deduction, which will become useful when attempting to identify unobvious links between biological concepts and/or scenarios. At the core of our methodology is a next-generation graph database, which greatly simplifies the integration of complex information such as interspecies gene orthology, pathways, bio-entity interactions, gene-to-disease associations, relationships extracted through text mining, and even raw data/patterns from high throughput experiments into a large on-disk semantic network. We have also developed a very user-friendly query language that simplifies complex querying and leverages the traversal functionalities of the graph database technology. Furthermore, as our semantic model is essentially schema-less, it easily accommodates the addition of novel information types, which then becomes immediately accessible for exploration.
We have developed a prototype focused on human health, which seamlessly integrates hundreds of thousands of human, mouse and rat: gene, gene to disease, gene to phenotype and gene to pathway relationships and utilizes multiple bio-ontologies as ‘anchors’ for semantic integration. While one of the primary objectives of the larger project is to enable in-silico experimentation through complex querying, another is to assist in the discovery of genotype-to-phenotype associations from high throughput experiments.
Figure: The semantic model of our prototype graph database for human health genomics and disease biomarker discovery, which simplifies cross-species and cross-knowledge-domain genotype to phenotype exploration.
APPLIED PROJECT: Application of Whole Exome Sequencing and Knowledge Discovery to Identify Disease Genes.
High throughput technologies like next generation sequencing make it possible to produce multiple gene candidates that may be of biological or medical interest.
While rapid advances have been made in methods that predict whether an identified SNP may have a functional effect, there is, however, often a need to identify unobvious links between the SNP and the phenotype/disease of interest. For example, if a ‘functional’ SNP is discovered in a gene that is not well known to be involved in the phenotype/disease being studied, then it is relatively easy to explore our human health related semantic database for transitive relationships that may explain the biological contribution to the phenotype, if any. Alternatively, evidence of known associations of the gene to ‘surrogate’ or ‘secondary’ phenotypes associated with a disease (e.g. insulin resistance in heart disease) can also be explored. This can be achieved directly from human gene-phenotype evidence, or transitively via model organism evidence (gene knockout phenotypes). The latter approach has the potential to identify rare mutations associated with a phenotype that would otherwise be missed.
As an exemplar study, we will be developing an exome sequencing + knowledge-discovery pipeline to identify the potential roles of known and novel SNPs/mutations in the development cardiovascular disease in South African patients.