The different approaches (statistical/experimental) of systems biology study of gene regulatory network would be an interesting task for me. For a particular gene to be expressed, the promoter (a regulatory region) requires binding of cell / tissue specific proteins. The total or type of proteins required might vary depending on the situation (physical or physiological conditions or requirements) of the organism. This shows that the information needed to have a particular type or number of proteins is in the sequence of DNA itself, interactions between protein-DNA and protein-protein. Once we know the nature of these regulatory elements, we can know the type of protein that is going to bind to a particular DNA binding site at particular physiological requirements. This computational knowledge of gene regulation will give a way of experimental approach to find a solution. By analyzing these individual elements within the promoter sites will enhance our understanding of promoter strength and regulation, thus increasing our understanding of gene expression.
1. Transcriptional and Splicing site regulatory elements Correlational studies:
Since long time researchers have focused most of their attention on protein-coding genes and proteins. With the completion of the human and mouse genomes and the accumulation of data on the mammalian transcriptome, the focus now shifts to non-coding DNA sequence, RNA-coding genes and their transcripts (Shabalina and Spiridonov, 2004). In modern eukaryotes, the transcription and processing of mRNA are highly coupled with intron splicing and /or exon recognition. Here the task is to correlate the regulatory elements involved in transcriptional regulation and splicing regulation (Wang et al., 2004; Yeo et al., 2004) at the DNA sequence level. By studying this we can know / better understanding of the evolutionary forces that drive the splicing mechanism for diverse proteins production.
2. Comparative analysis of promoters of regulatory RNAs:
The RNA interference (RNAi) was initially recognized in C.elegans as a response to dsRNA leading to sequence-specific gene silencing (Fire et al., 1998, Sayda et al., 2001). RNA silencing appears to be present in most, if not all, eukaryotic organisms. The common key player in RNA silencing is small RNA of 21-28 nt in length. Two classes of small RNAs are involved in RNA silencing: small interfering RNAs (siRNAs) and microRNAs (miRNAs). The major difference is their biogenesis. Apart from that, they are very similar. siRNA involved in sequence-specific mRNA degradation. miRNA repress translation of target mRNA . In general, miRNAs are endogenous and where as siRNAs are endo- as well as exogenious in origin. These RNAs act on target genes to regulate the target gene. There is a lot of scope to find new potential regulatory RNAs by using bioinformatics approaches. Another important thing is to study the regulatory region of siRNA / miRNA genes itself (Molly et al., 2006). The expression of regulatory RNAs is like protein coding genes (miRNA of plants are from introns using RNA pol II), encoded from the RNA pol II machinery. This shows that there might be a similar structure with the expression of regulatory RNAs. The crucial component in the analysis of miRNA promoter region is the accurate identification of the TSS.
3. An Integrated Gene Expression Database (IGEDB) - retrieval system
Aim of the project: RNA sequence retrieval database system (through the Database Integration) for Motifs discovery, Structure and function prediction for biomedical applications like drug targeting, gene silencing targets and other applications.
Today the challenge is to construct large scale genome databases (heterogeneous data) that are user-friendly. The main goal of an integrated database system is to allow users to access a set of distinguished and heterogeneous databases in a homogeneous manner (using a set of common tools). It is difficult because of lack of international standards for biological databases. The successful usage of genome information depends on the comprehensive analysis of genome data, the storage of genome and genome associated data, tools for inter genome comparisons and knowledge transfer (data processing), and the iterative enrichment of information resources (knowledge generation) compatible with current research interests. Database integration involves developing and implementing data standardization, data entry, curation, and reporting procedures. It will create authoritative data dictionaries of nomenclatural and bibliographic information linked to diverse collection, observational and spatial data. It will be delivered through either centralized or distributed systems and will be subject to standard procedures to protect data integrity, security and access rights. These developments will enable more effective resolution of within- and between-database queries. Database integration will necessarily be an iterative and ongoing process, and initial emphasis is on treatment of existing data and the standards used for them. In the medium term I will fully employ data standards that are consistent with international standards. Although I aim to develop fully enable web-based access, I especially prefer to develop toolkit for for downloading large datasets based on user specified criteria.
The ultimate goal is to create a detailed (evolutionary) model of gene regulation.