Submit Your Gene Set for Analysis with ChEA3

or

0 symbols entered, 0 duplicates, 0 valid symbols
New Query
Browse CHEA3 Results

Select a library:
Select TFs to visualize:
Top 10 TFs selected
Select a color:

About ChEA3

Citation

Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz M, Utti V, Jagodnik K, Kropiwnicki E, Wang Z, Ma'ayan A (2019) ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research.
doi: 10.1093/nar/gkz446

Transcription factors (TFs) are proteins that control gene expression by binding and unbinding near coding regions to regulate the transcriptional machinery. TF enrichment analysis (TFEA) prioritizes transcription factors based on the overlap between given lists of differentially expressed genes, and previously annotated TF targets assembled from multiple resources. ChEA3 is a web-based TFEA tool. Hence, ChEA3 can aid in identifying the TFs responsible for observed changes in gene expression when comparing control and perturbation samples. In the past, we have developed and published ChEA and ChEA2 which are ChIP-seq enrichment analysis tools made of gene set libraries created from published ChIP-seq data extracted from multiple sources. ChEA3 builds upon these prior versions of ChEA by including more libraries, adding benchmarks, and integrative library analyses.

For ChEA3 the following TF-target gene set libraries were assembled: putative targets as determined by ChIP-seq experiments from ENCODE, ReMap, and individual publications; co-expression of TFs with other genes based on processed RNA-seq from GTEx and ARCHS4; co-occurrence of TFs with other genes by examining thousands of gene lists submitted to the tool Enrichr; and gene signatures resulting from single TF perturbations followed by genome-wide gene expression experiments. The ENCODE TF target library contains ChIP-seq experiments from human and mouse and the Literature ChIP-seq library contains ChIP-seq experiments from human, mouse and rat. GTEx, ARCHS4, and ReMap libraries were constructed from human data.

The ChEA3 web application outputs enrichment results in the form of searchable, sortable data tables for each library and integration method. Using various benchmarks, we found that integrating libraries boosts the performance of ranking TFs correctly.

Transcription Factor Target Over-representation Analysis - The goal of ChEA3 is to predict transcription factors (TFs) associated with user-input sets of genes. Discrete query gene sets are compared to ChEA3 libraries of TF target gene sets assembled from multiple orthogonal 'omics' datasets. The Fisher's Exact Test, with a background size of 20,000, is used to compare the input gene set to the TF target gene sets in order to determine which TFs may be most closely associated with the input gene set.

Input - Users can submit a set of human or mouse gene symbols for transcription factor enrichment analysis. The input gene set can be derived from a variety of sources. Commonly, the gene set may be a set of differentially expressed genes, with the goal of forming hypotheses about which TFs might be responsible for the changes in gene expression. However, input from GWAS studies or whole-genome CRISPR-KO screen hits can be used as input to identify common regulators for these sets of genes.

TF Co-expression Network Visualizations - The ChEA3 application displays a scatter plot of points representing human transcription factors based on their co-expression similarity. This TF coexpression network was constructed by conducting a Weighted Gene Co-expression Network Analysis (WGCNA) over human TF expression data. WGCNs were constructed from GTEx samples, TCGA samples, and ARCHS4 samples. The network layouts were determined using Allegro Edge-Repulsive Strong Clustering in Cytoscape. For the GTEx networks, WGCNA TF module summary expression profiles, also known as Eigen-genes, were correlated with gene expression profiles from all GTEx tissue samples to determine strong module-tissue associations. Each GTEx WGCN module may be colored by its most highly correlated tissue. Tissues are aggregated into more general tissue types (e.g. brain) and more specific tissue types (e.g. cerebellum, frontal cortex):
General Tissue Specific Tissue
Adipose Tissue Adipose - Subcutaneous
Adrenal Gland Adrenal Gland
Blood Whole Blood
Blood Cells - EBV-transformed lymphocytes
Blood Vessel Artery - Tibial
Brain Brain - Cerebellum
Brain Brain - Frontal Cortex (BA9)
Brain Brain - Spinal cord (cervical c-1)
Colon Colon - Sigmoid
Colon Colon - Transverse
Esophagus Esophagus - Muscularis
Esophagus Esophagus - Mucosa
Heart Heart - Atrial Appendage
Liver Liver
Muscle Muscle - Skeletal
Nerve Nerve - Tibial
Pancreas Pancreas
Pituitary Pituitary
Prostate Prostate
Skin Skin - Sun Exposed (Lower leg)
Skin Cells - Transformed fibroblasts
Testis Testis
Thyroid Thyroid
Uterus Uterus
GTEx modules may also be colored by the top GO enrichment term. TCGA module Eigen-genes were correlated with tumor gene expression profiles to determine strong module-tumor type associations. Each TCGA WGCN module may be colored by its most highly correlated tumor type. Finally, the ARCHS4 module Eigen-genes were correlated with ARCHS4 tissue types. The ARCHS4 network modules may be colored by the most highly correlated tissue type.

Benchmarking Predictions - The performance of each library and integration method was benchmarked using 946 single-TF loss-of-function (LOF) and gain-of-function (GOF) experiments mined from GEO with microarray and RNA-seq readouts. These TF LOF/GOF experiments are composed of 503 mouse experiments and 433 human experiments. Gene sets were generated from the differentially expressed genes of these experiments and submitted to the ChEA3 API. The positive class consists of ranks assigned by ChEA3 to the TF perturbed in each experiment. The negative class consists of ranks assigned by ChEA3 to TFs not perturbed in the experiment. The areas under the receiver operating characteristic (ROC) and precision-recall (PR) curves were used to evaluate how well ChEA3 ranked the true perturbed TF. Each library has a different number of TFs covered. To enable comparison, the negative class was downsampled to the size of the positive class to generate PR curves such that a random classifier has an AUC of 0.5. These PR curves were bootstrapped 5,000 times to generate mean PR AUCs and composite PR curves. ROC curves were generated using the same bootstrapping method.

Figure 1. Performance of the ChEA3 libraries and integration techniques in recovering the perturbed TFs from 946 TF LOF and GOF experiments from the benchmark dataset. a) Mean ROC AUC and mean PR AUC over 5,000 bootstrapped ROC and PR curves; b) Composite ROC curves generated from 5,000 boostrapped curves; c) Composite PR curves generated from 5,000 bootstrapped curves; d) The deviation of the cumulative distribution from uniform of the scaled rankings of each perturbed TF in the benchmarking dataset. Anderson-Darling test of uniformity: MeanRank p = 6.34 x 10-7; TopRank p = 6.34 x 10-7; ARCHS4 p = 6.34 x 10-7; ENCODE p = 2.06 x 10-6; Enrichr Queries p = 6.83 x 10-7; GTEx p = 6.45 x 10-7; Literature ChIP-seq p = 1.28 x 10-6; ReMap p = 1.02 x 10-6.

Tutorial

Discover transcription factors associated with your gene set

  1. Submit Your Query - Submit a query with one gene symbol per row in the text box, or upload a text file with one gene symbol per line. ChEA3 accepts HGNC-approved gene symbols and will discard probe names, transcript IDs, and other unrecognizable IDs that have not been converted to gene symbols. ChEA3 accepts only HGNC-approved gene symbols but does not consider the case of the letters. Therefore, the tool can accept genes from other species that have orthologs with gene symbols that directly map to human symbols. You can use the example gene set available as a link above the submission box to gain familiarity with the expected format. Once submitting your query, you will be presented with the results page. Most gene set queries are processed in less than 30 seconds.

  2. Navigating Your Results - Results tables from the integration methods MeanRank and TopRank and from each ChEA3 TF target library are accessible from a dropdown menu. The MeanRank table is by default the first table shown because it performed the best in the ChEA3 benchmark. Tables may be searched by entering terms in the search box, sorted by column by clicking on the column header, and copied to the clipboard by clicking on the Copy button. Results for the ChEA3 TF target libraries are sorted by the Fisher's Exact Test p-value. This provides a ranking of transcription factors whose putative transcriptional targets are most closely similar to the query set. Integrated results, which take into account results from all libraries, are sorted in ascending order by score. Lower scores indicate more relevancy to the transcription factor. The top 100 results returned for each table. The full TF ranking and results for each integration method and library may be downloaded as a tab-separated (tsv) file using a link found at the footer of each results table. Full results are also available via the ChEA3 API.
  3. Visualizing Your Results - A global transcription factor co-expression network, a local results-specific co-regulatory network, bar charts, and a clustergram are available to aid in interpreting the results.

    • TF Co-expression Networks
    • The TF co-expression networks are provided to help users visualize their top ranking transcription factors in the context of the larger human transcriptional regulatory network. There are three networks generated from GTEx, TCGA, and ARCHS4 expression data. Users may toggle between these three networks using the dropdown menu. The slider above the network designates the number of TFs to highlight in the network. The TFs that are highlighted are the top results from the library that is selected from library selection dropdown menu. Network node coloring options provide additional information about the tissues or tumor types the transcription factors may be most active. The network may be zoomed, panned, and saved as an SVG image. There are other options, for example, hiding the legend or switching to full screen. These options can be accessed by clicking the "Network Options" hamburger.

    • TF Co-regulatory Networks
    • TF-TF co-regulatory networks are dynamically generated using the top results of the selected library. Edges between TFs are defined by evidence from the ChEA3 libraries and are directed where ChIP-seq evidence supports the interaction. The supporting evidence for each edge is summarized in a tooltip that appears upon hovering over each edge. The slider above the network indicates the number of top TF results to include in the network from the library results indicated in the library selection dropdown.
    • Bar Charts
    • Bar charts corresponding to the top TFs of the selected library display the -log10(pvalue) for ChEA TF target library results, and an integrated rank score for the integrated library results. The numer of top TFs on the y-axis may be modifed using the slider.
    • Clustergram
    • The interactive clustergram shows the overlapping query gene targets among top library results.


Run ChEA3 Locally - The ChEA3 Docker image is available on DockerHub. In order to run the ChEA3 web application locally, Docker must be installed. To install Docker, please visit https://docs.docker.com/install/.

Pull the ChEA3 Docker image
$docker pull maayanlab/chea3
Run ChEA3 locally on port 5000
$docker run -p 5000:8080 maayanlab/chea3
In your web browser:
localhost:5000/chea3/

API

ChEA3 has an API to enable programmatic access for querying the ChEA3 database

The ChEA3 REST API uses POST to transport user submitted JSON-formatted gene sets and JSON-formatted query results between the ChEA3 server and the user’s script. The user gene set and optional additional information are encoded in JSON format.

Method: POST
URL: /chea3/api/enrich/
Returns: JSON array of ChEA3 library result objects
Parameters:
query_nameString
gene_setAn array of strings


  1. Command-line Example - The following returns results from all ChEA3 libraries in JSON format:
    $ curl -d '{"query_name":"myQuery", "gene_set":["FOXM1","SMAD9","MYC","SMAD3","STAT1","STAT3"]}' -H 'Content-Type: application/json' https://amp.pharm.mssm.edu/chea3/api/enrich/
    Redirect the JSON-formatted results to a file:
    $ curl -d '{"query_name":"myQuery", "gene_set":["FOXM1","SMAD9","MYC","SMAD3","STAT1","STAT3"]}' -H 'Content-Type: application/json' https://amp.pharm.mssm.edu/chea3/api/enrich/ > results.json
  2. R Example -
    									rm(list = ls())
    									library(httr)
    									library(jsonlite)
    									
    									genes = c("SMAD9","FOXO1","MYC","STAT1",'STAT3',"SMAD3")
    									
    									url = "https://amp.pharm.mssm.edu/chea3/api/enrich/"
    									encode = "json"
    									payload = list(query_name = "myQuery", gene_set = genes)
    									
    									#POST to ChEA3 server
    									response = POST(url = url, body = payload, encode = encode)
    									json = content(response, "text")
    									
    									#results as list of R dataframes
    									results = fromJSON(json)
    								
    								

Download

"Primary" libraries are used by ChEA3 to conduct TF target over-representation analysis, "Benchmarking" libraries were used to benchmark ChEA3, and "Additional" libraries are available as an additional resource to the community.
Library Library Type Total TFs Unique TFs Average Gene Set Length Unique Interactions
ARCHS4_Coexpression Primary 1628 1628 300.0 480504
ENCODE_ChIP-seq Primary 552 118 1570.4 392667
Enrichr_Queries Primary 1404 1404 297.7 409279
GTEx_Coexpression Primary 1607 1607 300.0 468672
Literature_ChIP-seq Primary 307 164 1264.1 340547
ReMap_ChIP-seq Primary 297 297 1405.8 417025
Cusanovich_shRNA_TFs Benchmarking 49 49 600.0 29394
Single-TF_Perturbations Benchmarking 946 323 538.1 365679
HuMAP_TF_PPIs Additional 460 460 18.5 7561
BioGRID_TF_PPI_low_throughput Additional 498 498 7.3 3046
TFs_generif Additional 686 686 79.5 51999
TFs_generif_predicted_autorif_cooccurrence Additional 686 686 200.0 132732
TFs_generif_predicted_coexpression_ARCHS4 Additional 686 686 200.0 136973
TFs_generif_predicted_enrichr_cooccurrence Additional 686 686 200.0 135897
TFs_generif_predicted_generif_cooccurrence Additional 686 686 200.0 137135
TFs_generif_predicted_tagger_cooccurrence Additional 686 686 200.0 132083
TFs_tagger Additional 1390 1390 145.5 187598
TFs_tagger_predicted_autorif_cooccurrence Additional 1390 1390 198.5 263726
TFs_tagger_predicted_coexpression_ARCHS4 Additional 1390 1390 200.0 277376
TFs_tagger_predicted_enrichr_cooccurrence Additional 1387 1387 200.0 270434
TFs_tagger_predicted_generif_cooccurrence Additional 1387 1387 199.8 277104
TFs_tagger_predicted_tagger_cooccurrence Additional 1390 1390 199.8 258758
adipose.TFs Additional 1620 1620 300.0 473758
all_tissues.TFs Additional 1596 1596 300.0 468078
brain.TFs Additional 1620 1620 300.0 481106
breast.TFs Additional 1620 1620 300.0 474941
cervix.TFs Additional 1620 1620 300.0 476307
circulatory.TFs Additional 1620 1620 300.0 475219
colon.TFs Additional 1620 1620 300.0 476183
endocrine_glands.TFs Additional 1620 1620 300.0 475620
fibroblast.TFs Additional 1620 1620 300.0 476848
heart.TFs Additional 1620 1620 300.0 472195
hematopoietic.TFs Additional 1620 1620 300.0 478836
innate_immunity.TFs Additional 1620 1620 300.0 475804
liver.TFs Additional 1620 1620 300.0 473139
lung.TFs Additional 1620 1620 300.0 474951
lymphatic.TFs Additional 1620 1620 300.0 473749
muscle.TFs Additional 1620 1620 300.0 474280
osteoblast.TFs Additional 1620 1620 300.0 472586
other_lower_GI.TFs Additional 1620 1620 300.0 477124
ovary.TFs Additional 1620 1620 300.0 474686
pancreas.TFs Additional 1620 1620 300.0 474226
prostate.TFs Additional 1620 1620 300.0 473895
retina.TFs Additional 1620 1620 300.0 472304
skin.TFs Additional 1620 1620 300.0 477857
upper_GI.TFs Additional 1620 1620 300.0 476742
urinary.TFs Additional 1620 1620 300.0 471313
Citation

Keenan AB, Torre D, Lachmann A, Leong AK, Wojciechowicz M, Utti V, Jagodnik K, Kropiwnicki E, Wang Z, Ma'ayan A (2019) ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research.
doi: 10.1093/nar/gkz446

Affiliations
The Ma'ayan Lab
BD2K-LINCS Data Coordination and Integration Center (DCIC)
NIH LINCS program
Data Commons Pilot Project Consortium (DCPPC)
NIH Illuminating the Druggable Genome (IDG) Program
Icahn School of Medicine at Mount Sinai
Mount Sinai Center for Bioinformatics
License
Creative Commons License

All contents on this site are covered by the Creative Commons Attribution 4.0 International License.

-->