DNA microarrays have been widely used to measure the relative abundances of mRNA for thousands of genes in various experimental conditions. Previously, methods of unsupervised classification of gene expression profiles have been applied, such as hierarchical clustering and self-organizing maps. However, researchers employ DNA microarrays not merely to monitor the mRNA levels but to address their biomedical questions, such as mechanisms for a phenomenon, pathways involved in a phenotype, fingerprints of a biological state, and the relationships between genes. Post-hoc manual application of knowledge to array data through visual inspection has been used, but such a mode was not systematic or efficient. To address this issue, we have developed GABRIEL (Genetic Analysis By Rules Incorporating Expert Logic), a platform that incorporates knowledge into computer software to assist users to apply knowledge in microarray analysis.

The following contains information on various algorithms and rules that we have implemented within GABRIEL.

Experiment Review

The experiment review rules allow you to calculate the correlation between samples, which measures how similar two samples are based on the red/green ratio observed for spots on the microarrays. With this function, you can find out how reproducible the results are by calculating the correlation between the repeats of the same experimental condition. You can also examine the similarity between two experiments carried out with separate lines or types, between two experimental conditions, etc.

Physical Cluster Finding Algorithm

Physical cluster finding algorithm allows you to specify the number of genes there must be in a continuity and defines the range, instead of by number of genes as gap, by the number of nucleotide base pairs there can be in this range. For example, you can search for all continuities with at least 4 genes within 1 mega base pairs, with correlation coefficient larger than 0.6. This algorithm can be used for organism without all genes characterized such as human.

Genes in the physical cluster found can be further used as "probands" to identify genes with related expression profiles, as described below.

Pattern based rules

Pattern based rules allow you to specify parameters that define a pattern of interest; you can vary these parameters and explore the results using different settings. GABRIEL also calculates the false discovery rates and false negative rates for the patterns it identifies.

A related function is the t-score pattern rules, which allows users to specify patterns of t-scores. The calculation of t-scores enables assessment of statistical reliability based on the consistency of the data among repeats. For example, if you have 4 replicates of results obtained following an experimental treatment and 4 repeats of non-treatments, you can use the t-score pattern rules to identify the genes showing a specified expression pattern. In addition to finding genes having a particular expression pattern, GABRIEL can determine whether this group of genes demonstrates significant physical clustering on the chromosome as compared with any random group of genes.

Proband-based rules

Proband-based rules provide the capability of specifying a proband, i.e., a gene with a known behavior, and identifying the genes with related expression profiles. First of all, you can find the genes with similar expression profiles, (e.g., showing correlation coefficient greater than 0.7). GABRIEL calculates the false discovery rate and false negative rates of the results. In addition, you can find out the genes showing contrary expression profiles, (i.e., when proband expression goes up, their expression goes down). If you have a time-course experiment, you can further search for genes whose expression is similar to the time-delay or time-advanced version of the proband profile.

Gene Ontology and Functional Categorization Algorithm

With Gene Ontology and Functional Categorization Algorithm, one can find how many genes from a certain gene list contain different Gene Ontology terms. The gene list can be the results from another analysis algorithm such as pattern-based rules and proband-based rules. For example, one can use pattern based rule first to identify a group of genes over-expressed in the condition one is studying, then save the results from pattern based rule as a text file, then copy the gene accession numbers or Clone IDs in that text file and paste it in the textarea below. By doing so, one can find out the biological processes, molecular functions, and cellular components the over-expressed genes are associated with.

Gene Ontology comes from Gene Ontology Consortium. Gene Ontology is a controlled terminology for eukaryotic annotations. However, Gene Ontology Algorithm is not limited to eukaryotes. Rather, it can be used for prokaryotes as well. Prokaryotic gene annotation can be used to categorize genes, and Gene Ontology Algorithm will count the number of genes in each category and return the statistical significance.

For E. coli:
In addition, Gene Ontology and Functional Categorization Algorithm not only counts the number of genes in each category, it also analyzes whether there is significant enrichment of any specific group. The algorithm returns the expected number of genes in that category by chance and statisical significance of the enrichment of each category. For example, it may tell one that there are 40 genes associated with cell cycle, 20 genes expected to be associated with cell cycle on a random basis, and the false positive rate is 0.01, one finds that cell cycle is significantly enriched in this group of genes.

For Human and Mouse:
GO::TermFinder developed by Sherlock, Boyle et. al has been integrated into this method for analysis of human and mouse data. You will find statistical analysis with p-value results for each GO category.

Physical Clustering Significance

Physical clustering significance algorithm helps users examine the physical distribution of genes on the chromosomes. The algorithm takes a list of genes and estimates the significance of their physical distribution on the chromosomes. For example, you can enter in a list of genes from the output of a GABRIEL pattern based rule. The algorithm then counts the number of genes on each chromosome, and also counts the genes that fall within the distance threshold on the chromosome. For example, the algorithm identifies 2 genes that are within 1 mega base pairs. The algorithm then estimates the false positive rate of having such a physical distribution on the chromosomes.

We have also added PCA/SVD, ANOVA, commonality analysis of gene lists, and a few other algorithms to our set of analysis methods. For more information on how people have utilized our analysis methods, please visit our publications page, and if you'd like to try out the program, please contact us.

2002-2006 Stanford University
Last updated: November 9, 2006