| Web Application | Browse Database | Reprocessed Database | Query Databases | Commandline Tool | Case Studies |
ChEA Tutorial

Case study 1: Linking breast cancer signature genes to transcription factors

This case study demonstrates how ChEA can be used for comparative analysis of two reports that identified biomarker sets for invasive breast cancer inferred from microarray studies. In two independent publications, the authors used two independently collected compendiums of mRNA expression microarrays from patients with breast cancer tumors to find a signature set of genes that can differentiate between benign and malignant breast cancers [1 PMID: 15721472, 2 PMID: 11823860]. Both studies produced lists of genes considered as selected feature biomarkers. Surprisingly, the two lists, containing 162 and 73 genes, share very little overlap (two genes) which is statistically insignificant. However, when both gene lists are used as input for ChEA separately, they show enrichment for SMAD2/3 gene targets as detected in human HaCaT cells (see PMID: 18955504). The p-value of less than 1.0E-05 for the list of 73 genes (the third most significant out of all other experiments in the ChIP-X database) and also less than 1.0E-05 for the list of the 162 genes (fourth most significant ChIP-X experiment). When the lists are combined, SMAD2/3 is at the top of the list of enriched factors with an improved p-value less than 1.0E-09 (see Tables 1, 2, and 3 below).

Table 1 - ChEA results for van't Veer et al.

click to enlarge

Table 2 - ChEA results for Wang et al.

click to enlarge

Table 3 - ChEA results for the combined list

click to enlarge

Careful examination of the 35 genes that were identified as overlapping among the Smad2/3 targets from the two studies point to several genes that have been previously reported to play a role in breast cancer metastasis. In particular, MMP9 and CD44 are both highly implicated in breast cancer metastasis. MMP9 and CD44 are listed in GeneRifs for 23 and 17 articles returned by the query search breast cancer respectively. See table below:

Table 4 - Genes returned from a GeneRifs search using Lists2Networks

click to enlarge

MMP9 is a metalo-protease that digests the extra-cellular matrix during invasion, whereas changes in CD44 expression likely play a role in evading the host immune response. The results from the ChEA analysis clearly implicate that TGF-beta/SMAD2/3 signaling plays a dominant role in breast cancer metastasis and can be used to further explain the origins of the discrepancy between the original two studies. It has been well-established that TGF-beta/SMAD2/3 signaling is playing an important role in breast cancer metastasis [3 PMID: 17295676, 4 11809701, 5 PMID: 17695423]. However, the ChEA analysis combined with the microarray profiling provides unbiased global additional support for such hypothesis. Our results also complement a network analysis approach applied to the same data using protein interactions. Chuang et al. [6 PMID: 17940530] connected the breast cancer biomarkers identified by the two independent studies using known protein-protein interactions to find that a SMAD2/3 subnetwork, among other subnetworks, is up-regulated in metastasized tumors. Here we linked such results to transcriptional regulation evidence from ChIP-X studies. Gene expression profiling from different cancers, collected from patients or cell types, can now be linked to a transcription factor regulatory signature using ChEA. Such signature may hint, in a direct way, to the molecular regulatory mechanisms altered in any specific cancer subtype.

1. Wang Y, Klijn J, Zhang Y, Sieuwerts A, Look M, Yang F, Talantov D, Timmermans M, Meijer-van Gelder M, Yu J: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. The Lancet 2005, 365(9460):671-679. PMID: 15721472
2. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002, 415(6871):530-536. PMID: 11823860
3. Liapis G, Mylona E, Alexandrou P, Giannopoulou I, Nikolaou I, Markaki S, Keramopoulos A, Nakopoulou L: Effect of the different phosphorylated Smad2 protein localizations on the invasive breast carcinoma phenotype. Apmis 2007, 115(2):104-114. PMID: 17295676
4. Xie W, Mertens JC, Reiss DJ, Rimm DL, Camp RL, Haffty BG, Reiss M: Alterations of Smad Signaling in Human Breast Carcinoma Are Associated with Poor Outcome: A Tissue Microarray Study. Cancer Res 2002, 62(2):497-505. PMID: 11809701
5. Koumoundourou D KT, Zolota V, Tzorakoeleftherakis E, Ravazoula P, Vassiliou V, Kardamakis D, Varakis J.: Prognostic Significance of TGF-beta-1 and pSmad2/3 in Breast Cancer Patients with T1-2,N0 Tumours. Anticancer Research 2007, 27(4C):2613-2620. PMID: 17695423
6. Chuang H-Y, Lee E, Liu Y-T, Lee D, Ideker T: Network-based classification of breast cancer metastasis. Mol Syst Biol 2007, 3. PMID: 17940530

Case study 2: Re-analysis of gene over-expression followed by mRNA profiling of mESCs

We used ChEA to re-analyzed results from a report where the authors over-expressed 50 transcription factors, one-by-one, in mouse embryonic stem cells (mESCs) and then measured the effect of such perturbations on gene expression response using mRNA microarrays [1 PMID: 19796622]. Among the 50 transcription factors used, all the well-known mESCs regulators are included, i.e., Oct4, Nanog, and Sox2. The study identified Cdx2 as the transcription factor with the most dramatic effect when over-expressed, and as such it was selected for conducting a ChIP-seq experiment. We re-analyzed the results from the Nishiyama et al. study by inputting the top 500 genes that changed mostly as compared to the control for each of the 50 perturbations using ChEA. Surprisingly, the two studies that reported ChIP-X results for Suz12 binding appeared as the most statistically enriched for binding sites for almost all of the perturbations (Fig. 1, Table 5 below).

Table 5 - Nishiyama et al. differentially expressed genes overlap with ChEA

click to download table

Figure 1 - Most significant overlapping TFs with the differentially expressed genes found by Nishiyama et al.

click to enlarge
Fig. 1 ChEA analysis of the top 500 genes that changed in their mRNA expression after 50 over-expression experiments of single transcription factors in mESCs. The p-value rankings from ChEA for each transcription factor over-expression perturbation are inverse log transformed. The top ranked transcription factors reported by ChEA are labeled (peaks in the bar graph). Out of the 50 perturbations, only those factors that reached a high p-value of 1.0E-22 are labeled for clarity. Full results are available in Table 5.

The p-values for overlap with Suz12 targets were very significant, reaching for example, 1.65E-89 for the up-regulation of Sox9. Additional confidence is added due to the fact that the ChEA database contains two independent Suz12 ChIP-X experiments that do not fully overlap, and both studies appeared at the top for almost all the gain-of-function experiments. Suz12 is a member of the polycomb group (PcG) complex responsible for methylation of lysine 9 and 27 of histone 3. Such methylation is known to cause transcriptional suppression of differentiation genes [2 PMID: 15225548]. Hence, the fact all the changes in gene expression observed in this study are strongly associated with Suz12 targets regardless of the perturbation applied to mESCs may implicate that almost all perturbations cause differentiation. This suggests that the quantitative level of many components of the self-renewal machinery must be critically balanced to maintain the pluripotency state. It seems that the type of perturbation in itself was less critical as any perturbation induce similar global changes in chromatin rearrangements.


1. Nishiyama A, Xin L, Sharov AA, Thomas M, Mowrer G, Meyers E, Piao Y, Mehta S, Yee S, Nakatake Y et al: Uncovering Early Response of Gene Regulatory Networks in ESCs by Systematic Induction of Transcription Factors. Cell Stem Cell 2009, 5(4):420-433. PMID: 19796622
2. Ru C, Yi Z: SUZ12 Is Required for Both the Histone Methyltransferase Activity and the Silencing Function of the EED-EZH2 Complex Molecular Cell 2004, 15(1):57-67. PMID: 15225548

Case study 3: Cross-referencing ChEA with CMAP for designing multiple drug treatments for cancer

For this case study we combined ChEA with the Connectivity Map (CMAP) [1 PMID: 17008526]. Such combination of databases can be used to identify and rank small molecules that can potentially be used for controlling the activity of specific transcription factors. CMAP is a dataset of mRNA microarray expression profiling of drug-stimulated mammalian cancer cells. CMAP contains ~6,000 perturbations with ~1300 single drugs, sometimes in different concentrations, cell types, or other variable experimental conditions. Examining the genes that increased or decreased significantly after a perturbation, we can use ChEA to rank the transcription factors that most likely regulate (statistically over represented transcription factors) the genes that increase or decrease in expression due to the drug perturbation. This ranking can be used to design combinations of drugs that can potentially counteract the activity of specific transcription factors in a specific cellular context (see Fig. 2 below).

Figure 2 - Concept of using pair-wise combinations of drugs to influence a TF target-gene space

click to enlarge

Algorithmically, we can define two types of sets: one describing the relationship between drugs and the mRNAs they affect based on CMAP, and the other describing transcription factors and their target genes based on entries from the ChIP-X database. The drug-mRNA sets DR contains the top 500 genes that increased or decreased in expression given drug i and perturbation j from CMAP, and where j is used as the set label. Hence, the cardinality of all sets DRj is always 500. Elements in DRj sets are gene symbols. The second family of sets TR is made of target genes from all the ChIP-X experiments. Hence, TRi contains gene symbols reported to be targets of transcription factor i in experiment j where j is the set label. Now we can operate on these two types of sets to find the best pair-wise combinations of drugs that cover the target space for transcription factor i. First, we can compute the union of DRi with DRj to create a new family of sets DP describing how pairs of drugs may affect gene expression in an additive manner. Hence DP contains a family of n*n/2-n sets, where n is the number of drug perturbation experiments in CMAP. Note that the family of sets DP contains also genes symbols as the elements of the sets. We are now ready to score how drug perturbation pairs may affect the activity of transcription factor k in experiment j. Such score is simply the intersection between DP and TR.

The scoring scheme can be used to suggest, for example, how we can use small molecules to induce the activity of specific transcription factors such as Oct4 for iPS reprogramming, or for blocking uncontrolled cell proliferation by targeting the oncogene Myc. In this case study we devise mechanisms to down-regulate the activity of the transcription factor Myc and hence potentially block the proliferation of cancer cells. We ranked pairs of drugs based on their combined coverage of Myc targets (Fig. 2). Our strategy optimizes selection of drug-pairs that do not have similar effects on Myc targets.

The table below provides the resulting top ten pair-wise entries.

Drug1-ExpIDDrug2-ExpIDTargets for D1/D2/OverlapTotal Targets
nocodazole-1393valproic acid-1047178/192/7363/500

Top ranked pairs of drug perturbations from CMAP experiments that cover the gene target space of Myc as determined by several ChIP-X experiments independently. D1-drug1, D2-drug2, ExpID- is the experiment ID from CMAP.

The top-ten ranked list of pair-wise drugs suggests combinations of drug treatments for further maximally reducing Myc transcriptional regulatory activity. The combinations we identified include known cancer drugs as well as other drugs. For example, monastrol is a known cancer drug that targets kinesin-5, a motor protein important in mitosis [2 PMID: 10542155], whereas clonidine is a alpha-adrenergic agonist [3 PMID: 4393798] that is an anti-hypertensive used to aid sleeping and treat ADHD. Hence, it is likely acting through a different pathway to regulate a subset of Myc regulated target genes. Our initial approach of combining and ranking pairs of drugs to regulate the activity of specific transcription-factors can be further improved in many ways. Our initial formulation be extended by using quantitative values instead of sets, and include statistical randomization as control. In summary, the approach presented in this third case study provides a step forward toward rationale combinatorial application of drugs to treat specific cancers with a transcription-factor anchoring.


1. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet J-P, Subramanian A, Ross KN et al: The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 2006, 313(5795):1929-1935. PMID: 17008526
2. Mayer TU, Kapoor TM, Haggarty SJ, King RW, Schreiber SL, Mitchison TJ: Small Molecule Inhibitor of Mitotic Spindle Bipolarity Identified in a Phenotype-Based Screen. Science 1999, 286(5441):971-974. PMID: 10542155
3. Andén NE, Corrodi H, Fuxe K, Hökfelt B, Hökfelt T, Rydin C, Svensson T: Evidence for a central noradrenaline receptor stimulation by clonidine. Life Science 1970, 9(9):513-523. PMID: 4393798

| Ma'ayan Laboratory | Systems Biology Center New York | Mount Sinai School of Medicine |
| Contact UsMAIL |