Overview

To get started with GEN3VA, you first need to create a collection of tagged gene expression signatures from GEO using GEO2Enrichr. GEO2Enrichr is a browser extension software tool that enables you to easily extract and tag signatures from GEO, automatically adding these signatures to GEN3VA.

Installing and using GEO2Enrichr

To install GEO2Enrichr and to learn how to use it, visit GEO2Enrichr's website.

Tags and reports

A tag is a plain text term that you can use to associate multiple signatures from different studies. These tags are non-hierarchical keywords assigned to a gene signature. In social media, the equivalent idea is a 'hashtag', which is a tag preceded by a '#' or hash symbol. These tags enable the creation of collections of gene signatures around common themes. A gene signature can have multiple tags. Tagging is handled by GEO2Enrichr and is done when the signature is processed. See GEO2Enrichr's documentation for more details.

A report is a GEN3VA page with multiple visualizations and analyses of gene signatures under a single tag. For example, the gene signatures associated with the tag AGING_BD2K_LINCS_DCIC_COURSERA can be found here, while the related report can be found here. A report can also be generated from a subset of signatures associated with a tag. This can be done using the custom report builder on each tag page.

Understanding reports

Finding consensus across gene signatures profiled under the same or similar conditions by independent studies can address important issues such as data reproducibility, assist in better understanding common biological mechanisms, or facilitate drug discovery by identifying consensus drugs that can mimic or reverse gene expression across a collection of signatures. The key idea underpinning GEN3VA is that gene signatures under a common biological theme may share common properties, such as genes in their gene lists, enrichment terms for gene set enrichment analysis results, or drugs that are predicted to reverse or mimic expression. A report is best understood as aggregated information, a consolidation of metadata, analyses, and visualizations on a collection of gene signatures around a common theme in an attempt to find consensus between signatures.

PCA plots and heatmaps

GEN3VA reports provide principal component analysis (Fig. 1) and heatmaps (Fig. 2) plots of the gene signatures in a collection. These plots are interactive; you can rotate, zoom, and mouse over the plots to view more information. The heatmaps are built using Clustergrammer, a visualization application developed by the Ma'ayan Lab.

Fig. 1: Screenshot of the principal component analysis plot of 275 gene signatures extracted from studies that profiled the effect of adding endogenous ligands to mammalian cells.

Manipulating the heatmaps

Clustergrammer provides a good video tutorial for interacting with the heatmaps, but GEN3VA has a couple additional features. First, GEN3VA heatmaps all have a dataset category. Gene signatures from the same study will be grouped in the same category, the category being the study or dataset name. This is useful for checking if clusters in the heatmap are merely artifacts of duplicate data or are genuinely interesting.

Second, you can remove a column from a dataset using shift + click. This allows you to probe what the heatmap will look like with specific signatures removed. A single click will sort the column.

These changes are not permanent, but you can build a custom report with any subset of signatures you would like using the custom report builder.

Fig. 2: Screenshot of the heatmap of the 116 gene signatures from the endogenous ligand collection.

Enrichment vector analysis with Enrichr

Enrichr is a web application for performing enrichment analysis on individual lists of genes. The input is a list of genes; optionally, each gene can be followed by a weight indicating the degree of membership in the gene list. The output is a list of enrichment terms from many gene set libraries. GEN3VA performs enrichment vector analysis with Enrichr by submitting each gene signature from a tagged collection in a report and then clustering the resultant terms.

L1000CDS2 plots

L1000CDS2 is a web application that queries gene expression signatures against the LINCS L1000 data to identify and prioritize small molecules that can either reverse or mimic the observed differential expression from the input signature. GEN3VA creates L1000CDS2 heatmaps by performing this analysis for every gene signature in the tagged collection and then clustering the scores for mimicking and reversing the expression pattern from the input signatures.

Custom Reports

Sometimes a researcher may want to build a report from a subset of gene signatures from an existing collection. There are few a reasons to do this, such as wanting to categorize the signatures using a different metadata field or wanting to remove specific signatures for quality control reasons.

GEN3VA has a custom report builder (Fig. 3) that allows a user to select all or some of the gene signatures from an existing collection and build a report. To build a custom report for a collection, click on the Signatures and Custom Report Builder button located at the top of any report page and follow the instructions there.

Fig. 3: Screenshot of a user building a custom report from the AGING_BD2K_LINCS_DCIC_COURSERA collection. Gene signature selection not shown.

Methods

Implementation details and database

The GEN3VA web server is Flask application running in an Apache HTTP Server with the Apache WSGI module installed. Flask is a WSGI-compliant framework for building web applications in Python. The application and its dependencies are packaged and deployed in a Docker virtual machine onto a 16-node computer cluster maintained by the Ma'ayan Lab.

The GEN3VA database runs on an internal MariaDB server, a drop-in replacement for MySQL, that is maintained by the Ma'ayan Lab. This database is common to both GEN3VA and GEO2Enrichr applications. Both applications use SQLAlchemy ORM. An ORM (Object-Relational Mapping) is a framework that maps a tabular schema onto an object paradigm. GEN3VA and GEO2Enrichr share ORM models and utility functions for accessing gene signatures through an external project called Substrate.

API

POST /gen3va/api/1.0/upload

Uploads complete gene signature to GEN3VA's database. Does not perform any cleanup or analysis.

ranked_genes A list of ranked genes represented as an array of arrays. For each inner array, the first element is the gene symbol and the second element is gene's weight or value.
diffexp_method Differential expression method. Default is chdir.
cutoff Optional. The maximum number of genes in the resultant gene list. Defaults to 500.
correction_method Only applicable if diffexp_method is ttest. Defaults to BH for Benjamini Hochberg.
threshold Only applicable if diffexp_method is ttest. Defaults to 0.01.
gene Optional. Name or symbol for relevant gene.
cell Optional. Name of relevant cell type or tissue.
perturbation Optional. Name of relevant perturbation.
disease Optional. Name of relevant disease.
tags Optional. An array of tag names to be assigned to the gene signature.

Example using the requests library.

import json
import requests

payload = {
    'ranked_genes': [
        [
            'CPSF3',
            0.000631847
        ],
        [
            'CLEC18B',
            0.00876892
        ],
        [
            'RTDR1',
            0.0000692485
        ],
        [
            'MYLPF',
            0.00218427
        ],
        [
            'KIF2B',
            0.0000457653
        ],
        [
            'SMPDL3A',
            0.00876879
        ],
        [
            'FAM171A2',
            0.00442025
        ],
        [
            'RASGRF2',
            0.0145588
        ],
        [
            'ROCK2',
            0.00218436
        ]
    ],
    'diffexp_method': 'chdir',
    'tags': ['test_tag'],
    'gene': 'STAT3',
    'cell': None,
    'perturbation': None,
    'disease': None
}
resp = requests.post('http://amp.pharm.mssm.edu/gen3va/api/1.0/upload',
                     data=json.dumps(payload))