In this update of Enrichr we report that we submitted the Enrichr API to SmartAPI so Enrichr can be integrated with other tools and products of the NIH Data Commons. This release of Enrichr contains new reference genomes, human (hg 19 and hg38) and mouse (mm9 and mm10), for the BED-file conversion and upload. Since the last update, many new gene-set libraries were either added or updated. Two new libraries were created from the aggregated knowledge extracted from Enrichr submitted queries. For example, the new “Enrichr Submissions TF-Gene Coocurrence” library is made of all human transcription factors and the genes that mostly co-occur with them in Enrichr submitted queries. Similarly, we also created a library that has the most popular genes depending on the data acquisition method. We found that some genes tent to be over-represented in specific libraries just due to the data acquisition method, for example, gene highly represented in microarrays or RNA-seq signatures. In addition, the two microRNA-target libraries miRTarBase and TargetScan were added and updated respectively; as well as a library created from DSigDB was added. DSigDB is a resource that relates drugs and small molecules to their target genes based on various types of data. Finally, an information icon was added to the dashboard view to show more information about each gene set library when browsing the Enrichr results.
For this release we added five libraries generated from the ARCHS4 project. ARCHS4 contains processed RNA-seq data from over 100,000 publicly available samples profiled by the two major deep sequencing platforms HiSeq 2000 and HiSeq 2500. After alignment and normalization, we computed co-expression correlation for all human genes. From this co-expression correlation matrix, we generated three new libraries: a) top 300 genes that are co-expressed with transcription factors; b) top 300 genes that are co-expressed with kinases; and c) top 300 genes that are co-expressed with under-studied drug targets from the Illuminating the Druggable Genome (IDG) project . We also added two additional libraries created from ARCHS4: genes that are highly expressed in human cell-lines and tissues. These two libraries were created by z-scoring the expression of each gene across all cell-lines or tissues. In addition, we updated the Gene Ontology libraries by removing high level terms and following a more rigorous process based on an Enrichr user suggestion. Two new counters were added to the landing page showing the number of libraries, and the number of terms across all libraries. We also changed the way the combined score is calculated by multiplying the unadjusted, instead of the adjusted, p-values with the z-scores.
In this release of Enrichr we added and updated several gene set libraries. The MGI Mammalian Phenotype library was updated and now contains 5231 terms that describe phenotypes. This library has many more terms than the old MGI library made of 476 terms. The old version was created in 2013 and can now be found in the Legacy category for provenance. We also added a new library to the Crowd category. All the signatures in the Crowd category so far were from microarray studies. The new library is made of 1302 signatures created from RNA-seq data. The library contains disease, gene, and drug signatures extracted manually from GEO. Another new library was added to the Pathways category. This library was created from hu.MAP, a new database of human protein-protein interactions determined by over 9,000 mass spectrometry experiments performed by the Marcotte Lab from UT Austin. A paper that describes the hu.MAP project is available on Biorxiv. We also added three new libraries to the Ontologies category. These libraries were created from the COMPARTMENT, TISSUES, and DISEASES datasets developed by the Jensen Lab from the University of Copenhagen. Besides new and updated libraries, we also updated the BED-file upload feature. We now support various reference genomes: for human we support hg18, hg91 and hg38, and for mouse mm9 and mm10.
This new version of Enrichr includes many major changes and updates. The enrichment results are now displayed as a summary of enriched terms displayed as bar graphs for all libraries within a category. Another important update is a correction to the enrichment analysis formula to better match the classic Fisher Exact Test. For backward compatibility, the old enrichment scores can be found in the downloadable spreadsheets under the columns: old p-values and adjusted old p-values. In this release we also added an information icon that provides descriptions for each library. There are also two new libraries: the DrugMatrix library and ChEA 2016. The ChEA 2016 library includes 250 new entries from published ChIP-seq studies that we collected and processed in the past year. This is a 63% growth in size for ChEA.
For this release of Enrichr we significantly expanded the Help section with updated detailed description of the expanded Enrichr API.
We also changes the analysis button, and now display the adjusted p-values as tooltips on the bar graphs and in the tables.
Insignificant terms are now displayed in gray. We also now display the results as clustergrams where we display the most common genes for the most enriched terms.
In addition, we improved the quality of the fuzzy enrichment option.
Since the last release we updated many of the libraries and added new libraries. The new libraries include: libraries created from the LINCS L1000 data, GTEx, signatures extracted by the crowd from GEO for aging, ligands, pathogens, and MCF7 perturbations. Updated libraries include: KEGG, WikiPathways, ENCODE, ChEA, BioCarta pathways, HumanCyc, NCI-Nature pathways and Panther.
In this new release of Enrichr we updated our ChIP-x Enrichment Analysis (ChEA) database with gene sets extracted from forty new studies. The previous version is now in the 'Legacy' category for provenance.
We also added a new gene set library we created from the database of Genotypes and Phenotypes (dbGaP), as well as two new libraries with the up- and down-regulated genes from the L1000 Connectivity Map chemical perturbation profiles from the Broad Institute LINCS Center for Transcriptomics. The previous version of the Connectivity Map Affymetrix data was renamed to Old CMAP.
In this release we improved the 'Find a Gene' feature, making it more clear and descriptive.
If you haven’t noticed, Enrichr now has a calendar view of submission statistics - you can access it by clicking on the link of lists analyzed.
The documentation of the Enrichr API was also updated. We recently used the Enrichr API to develop a new Mobile App called the Harmonizome. This mobile app is available at Google Play and the App Store. With this app you can explore aggregated knowledge about mammalian genes. The knowledge provided within this app is a subset of the Harmonizome project which can be accessed at: http://amp.pharm.mssm.edu/Harmonizome.
In this release we added a new category to Enrichr called "Crowd". In this category we will have gene set libraries that are created through our crowdsourcing efforts. The Crowd category currently contains six gene-set libraries for up/down genes in disease vs. normal tissue, before and after drug perturbation of mammalian cells, and before and after single gene manipulation in mammalian cells.
This release also has a major upgrade to our own kinase enrichment analysis (KEA) library with many more kinase-substrate interactions. We also created a gene set library from NIH Reporter by associating grants with genes through grant related publications and GeneRIF.
This release of Enrichr also contains several bug fixes, improved table sorting, and new canvases and networks for all libraries.
A new related addition to Enrichr is GEO2Enrichr. GEO2Enrichr is a browser extension plug-in and an independent web based application that enables users of Enrichr to process expression data tables from GEO, or from their own unpublished studies. The application is implemented as a Chrome extension or a FireFox Add-on. With GEO2enrichr you can quickly extract differentially expressed genes from published datasets on GEO, or from you own data, and analyze these lists with Enrichr. A YouTube video from a recent works-in-progress presentation about GEO2Enrichr is available.
All the gene set libraries of Enrichr are now available for download. These datasets can be used for global and local analyses, and for building new tools. Please acknowledge our Enrichr publication if you use one of the original gene-set library files we created.
Several new gene set libraries were added to Enrichr in the past few months: Pathway gene-set libraries created from HumanCyc, NCI-Nature PID, and Panther; Gene set libraries created from the human phenotype ontology and Uberon cross species phenotype ontology; A gene set library extracted from our ESCAPE database; and a gene set library that group genes based on their evolutionary age created from Homologene.
This release of Enrichr includes a complete redesign of the gene set library database. This updates makes Enrichr load and display results faster.
This release also contains several new and updated gene set libraries bringing the total number of libraries to 69 and gene sets to 56498. The new and updated libraries are listed below:
The ENCODE transcription factors and histone modifications libraries were updated using the datasets listed at: https://www.encodeproject.org
The Pathways category now has a phosphosite enrichment analysis functionality using data processed from DEPOD: http://www.koehn.embl.de/depod
The Diseases/Drugs category has data from the Achilles project associating individual gene knockdowns with response of cancer cell lines to those responses: http://www.broadinstitute.org/achilles
The Cell Types category now has processed gene lists from the Allen Brain Atlas (http://www.brain-map.org) were each gene set describes highly and lowly expressed genes in hundreds of different brain regions.
We also added a Legacy category to list old gene set libraries so users can reproduce enrichment results they obtain before these libraries were updated.
The Human Phenotype Ontology is an ontology of phenotypic abnormalities encountered in human disease. Terms in the HPO describes a phenotypic abnormality, such as atrial septal defect. We have taken a cross section of the ontology at the level resulting in appropriate sized gene sets.
We have updated the three Gene Ontology Consortium gene set libraries. These libraries are created using the core ontology from the Gene Ontology Consortium, annotated with associated Homo Sapiens genes. We take a cross-section of the ontology tree at the level resulting in appropriate sized gene sets. The three gene set libraries in Enrichr are called: GO Biological Process, GO Cellular Component and GO Molecular Function
We have added three additional gene set libraries.
- Virus Perturbations from GEO: This gene set library contains the most significantly up or down regulated genes for 323 different virus perturbations. This data is derived from experiments in GEO that measured gene expression levels before and after virus perturbation.
- Tissue Protein Expression from ProteomicsDB: This gene set library was derived from ProteomicsDB - a high-throughput study profiling protein expression in ~200 tissues. It consists of tissues and their respective significantly up and down regulated genes. The genes are the result of mapping the protein profiles in the study to gene sets.
- Tissue Protein Expression from Human Proteome Map: This gene set library was derived from the Human Proteome Map - a high-throughput study profiling protein expression in ~30 tissues. As above, the data maps gene sets to tissues.
We have added seven new gene set libraries to Enrichr and updated two. Read on for further details of each library.
- ENCODE Histone Modifications: This gene set library maps each histone modification term to its related genes. Here related means the genes near histone modification peaks. The data is taken from the NIH project ENCODE.
- Transcription Factor PPIs: Using our literature based protein-protein interaction network, each transcription factor protein is a term in this library and the proteins that interact with it are the set elements.
- Kinase Perturbations from GEO: This library was compiled by manually searching GEO for experiments that measured gene expression level before and after a kinase perturbation. Each kinase perturbation is a term and its elements are the genes that are differentially expressed.
- Kinase Perturbations from L1000: This is the same as above except the data is created from the L1000 dataset created at the Broad Institute, by the Connectivity Map group.
- MGI Mammalian Phenotype Level 3 and MGI Mammalian Phenotype Level 4: These gene set libraries take the MGI Mammalian Phenotype ontology and cut it at a depth of 3 or 4. Each phenotype term is mapped to the associated knockout genes in the ontology. The level 4 version already existed in Enrichr, here we added level 3.
- Disease Signatures from GEO: Similarly to Kinase Perturbations from GEO this gene set library was compiled from experiments from GEO where gene expression levels were measured in normal tissue vs diseased tissue. Each disease is a term in this library and the differentially expressed genes are its elements. This library was contributed by the Dudley lab.
- Drug Perturbations from GEO: This library has been updated with recent experiments from GEO. This library was compiled using experiments from GEO where gene expression levels were measured before and after the administration of an FDA approved drug. Each drug is a term and the the associated differentially expressed genes are its elements.
- ChEA: This library has been updated with recent papers from Pubmed. This library consists of transcription factors featured in papers in pubmed. Each term is a transcription factor and the pubmed id of the paper the data was taken from and the set elements are the genes each paper links to the transcription factor.
Enrichr can now accept BED files as input for enrichment. Enrichr automatically converts the BED file into a gene list. The species supported are human and mouse. The maximum number of genes to produce from the bed file can be adjusted. If the gene list produced by the conversion has more genes than the maximum, Enrichr will take the best matching 500, 1000 or 2000 genes. The file must have the extension ".bed" and must be uploaded, not copy and pasted.
You can now view your input gene list from the results page and view past saved gene lists from your account page.