Abstract
Interactive notebooks can make bioinformatics data analyses more transparent, accessible and reusable. However, creating notebooks requires computer programming expertise. Here we introduce BioJupies, a web server that enables automated creation, storage, and deployment of Jupyter Notebooks containing RNA-seq data analyses. Through an intuitive interface, novice users can rapidly generate tailored reports to analyze and visualize their own raw sequencing files, their gene expression tables, or fetch data from >5,500 published studies containing >250,000 preprocessed RNA-seq samples. Generated notebooks have executable code of the entire pipeline, rich narrative text, interactive data visualizations, and differential expression and enrichment analyses. The notebooks are permanently stored in the cloud and made available online through a persistent URL. The notebooks are downloadable, customizable, and can run within a Docker container. By providing an intuitive user interface for notebook generation for RNA-seq data analysis, starting from the raw reads, all the way to a complete interactive and reproducible report, BioJupies is a useful resource for experimental and computational biologists. BioJupies is freely available as a web-based application from: http://biojupies.cloud and as a Chrome extension from the Chrome Web Store.
Generating Notebooks
BioJupies enables users to generate customized analysis notebooks from RNA-seq data through an intuitive interface, without requiring any knowledge of coding.
Generating a notebook requires the following three steps:
  1. Select the data you wish to analyze. You can upload your own RNA-seq data - either as raw FASTQ files or as a table containing gene counts - or search from over 9,000 publicly available datasets published in the Gene Expression Omnibus (GEO) and processed by ARCHS4.
  2. Select data analysis tools. After selecting the data, you can choose from over 10 data visualization and analysis tools. These tools will be used to analyze the RNA-seq dataset you uploaded or selected, and embed interactive figures in the resulting notebook.
  3. Generate the notebook. Once you have selected the set tools you wish to apply do the data, you can generate the notebook directly from the BioJupies website. The notebook will be returned to you through a permanent and public URL.
To start generating your notebook, please visit https://amp.pharm.mssm.edu/biojupies/analyze. First, you will be required to specify the dataset you wish to analyze. Three options are available:
  1. Search Published Datasets. The BioJupies search engine can be used to rapidly find datasets to analyze from over 9,000 publicly available published studies from GEO. Only studies that were processed by ARCHS4 are available for analysis with BioJupies.
  2. Upload Your Dataset. BioJupies enables you to upload your own RNA-seq dataset, either as raw FASTQ sequencing files, or as processed gene expression count tables.
  3. Use the Example Dataset. To learn how to generate a notebook with BioJupies, we provide an example dataset.
BioJupies allows you to generate customized analysis notebooks by uploading your own RNA-seq datasets. To upload your data, visit https://amp.pharm.mssm.edu/biojupies/upload.
Data can be uploaded in two formats:
  1. Gene Count Tables. If you already performed sequence alignment, you can upload text files containing raw gene counts. The files that you intend to upload must have gene symbols as the rows and samples as the columns. For more information on how to uploading gene counts tables, click here.
  2. FASTQ Files. You can upload raw sequencing data stored in the .fastq.gz format directly to BioJupies. BioJupies will align your reads to the selected reference genome and generate a notebook from the estimated gene counts. For more information on how to upload FASTQ files for alignment, click here.
Once you have uploaded your data, proceed to Adding Analysis Plugins.
In addition to uploading your data, BioJupies users can also analyze datasets avilable from over 9,000 published studies currently available from GEO.
To search the pre-processed RNA-seq datasets indexed by BioJupies, visit https://amp.pharm.mssm.edu/biojupies/analyze/search. You can find datasets relevant to search terms by submitting queries to the search engine. The results are sorted by date of publication.Filters are available to select an organism, or the minimum and maximum number of samples contained within each dataset.
The More Info buttons provide access to a detailed description of the dataset, as well as a link to its landing page on GEO.
Once you have identified a dataset of interest, click on the Analyze button to proceed to the next step.
Once you have selected or uploaded a dataset to BioJupies, you can select from a range of data analysis and visualization tools to include in the final notebook report. The selected tools are plugins that will be used to analyze the dataset and embed interactive plots in the resulting notebook. BioJupies currently supports a variety of RNA-seq data analysis plugin tools divided into four categories: Exploratory Data Analysis, Differential Expression Analysis, Enrichment Analysis, and Small Molecule Query.
You can toggle the inclusion or exclusion of plugin tools by clicking on the Add or Remove buttons displayed on the right of each card. Additional information regarding each plugin, including links to interactive examples, and plugin source code on GitHub, are available by clicking on More Info . Once you have selected the desired plugins, click Continue .
If you only selected plugin tools in the Exploratory Data Analysis category, proceed to Customizing Parameters; otherwise proceed to Selecting Groups.
If you selected one or more plugins in the Differential Expression, Enrichment Analysis or Small Molecule Query categories, you are required to specify two groups of samples.
The two groups will be used to perform the differential gene expression analysis, and to identify a gene expression signature from the data. Groups are defined using the interface shown below: You can define the two groups of samples in two ways:
  1. Specify the group names in the input boxes provided. BioJupies will automatically predict the samples belonging to each group based on the sample metadata provided. You can turn off this feature by clicking on the toggle button displayed beside Predict Groups on the right.
  2. Use the dropdown buttons to manually select the samples assigned to each group.
In order to improve the statistical validity of the differential expression analysis, BioJupies requires that groups contain at least two RNA-seq samples.
Once you have selected the desired groups, click Continue to proceed to the next step.
After selecting the desired RNA-seq dataset and analysis plugin tools, you can customize your notebook by changing its title, adding subject tags, and modifying optional parameters for each of the selected plugins. To customize your analysis, click on the Modify Parameters button displayed on the right of each plugin. BioJupies provides the ability to modify the analysis parameters for a subset of RNA-seq analysis plugin tools.
Once you have selected the desired settings, click on Generate Notebook to generate the report.
After clicking on Generate Notebook a notebook generation job is submitted to the BioJupies server, and a loader is displayed on the webpage. The process of notebook generation typically takes between 10 seconds to 3 minutes, depending on the number of plugin tools selected and the size of the RNA-seq dataset.
Once the notebook has been generated, a link is displayed on the BioJupies website. To access the results of your analysis, click on Open Notebook. The notebook is permanently stored on the BioJupies server and can be accessed at any time through the same persistent URL.
Uploading RNA-seq Data
BioJupies enables users to generate complete analysis notebooks from their own RNA-seq data through an intuitive web interface with no knowledge of coding required. To start uploading your data, visit https://amp.pharm.mssm.edu/biojupies/upload. Data can be uploaded in two formats:
  1. Gene Count Tables. If you already performed sequence alignment, you can upload text files containing raw gene counts. The files that you intend to upload must have gene symbols as the rows and samples as the columns. For more information on how to uploading gene counts tables, click here.
  2. FASTQ Files. You can upload raw sequencing data stored in the .fastq.gz format directly to BioJupies. BioJupies will align your reads to the selected reference genome and generate a notebook from the estimated gene counts. For more information on how to upload FASTQ files for alignment, click here.
If your RNA-seq data has already been processed, you can upload a table containing of gene expression counts at https://amp.pharm.mssm.edu/biojupies/upload/table. In order to successfully upload an RNA-seq dataset, your file must have the following format:
  • HGNC gene symbols should be the first term in each row. We recommend using HGNC gene symbols as the gene identifiers, since several of the plugin tools in the Enrichment Analysis and Small Molecule Query categories support these identifiers. BioJupies also supports other identifier systems such as Ensembl IDs and Entrez IDs; however, please note that the plugins in the enrichment analysis and small molecule queries may not return optimal results.
  • Have sample names on the columns.
  • Have raw gene expression counts as values. While users may successfully upload and analyze normalized RNA-seq datasets (such as FPKM values or log-transformed data) please note that the analysis plugins are currently optimized for analysis of raw gene counts only.
  • The uploaded table should be stored in one of the following formats: .txt, .tsv, .csv, .xls or .xlsx. An example gene expression table file can be downloaded here.
If you have raw RNA-seq data in FASTQ format, you can upload the files for analysis by BioJupies at https://amp.pharm.mssm.edu/biojupies/upload/reads. To upload your files, click on and select the FASTQ files you want to upload. Files must be compressed using gzip and must not individually exceed 5GB in size.
Once you have selected the set of FASTQ files, click on Upload Files . BioJupies will then upload the files and prepare them for alignment with Elysium, our RNA-seq cloud alignment service. Upload may take between a minute to over an hour, depending on the size of the files and on the speed of your internet connection.
Once the upload process is completed, the Continue button will be activated and you will be able to proceed to the next step.
After the FASTQ files have been uploaded, you can configure the alignment process using the interface shown below: In order to estimate expression levels for each gene, you need to specify two parameters:
  1. The organism from which the RNA-seq samples were generated. This parameter will be used to select the reference genome for the alignment.
  2. Whether the FASTQ files were generated using single-or paired-end sequencing.
By selecting single-end sequencing, the FASTQ files will be individually aligned to the reference genome and will each return one column in the resulting dataset. When selecting paired-end sequencing, you will be asked to specify which pairs of FASTQ files have been generated from the same biological sample.
The pairs will be used to perform paired alignment and will each return one column in the resulting dataset. Samples can be optionally renamed to be more descriptive using the text boxes provided. Once you have selected the desired alignment settings, click Continue to launch the alignment jobs, then proceed to the next step.
Once the alignment jobs are launched, you will be redirected to a page displaying the status of each job. Alignment jobs may have four statuses:
  • Waiting. This indicates that your sample is in the queue. It will be executed when resources are available.
  • Submitted. This indicates that the job is currently executing.
  • Completed. This indicates that the job was completed successfully.
  • Failed. This indicates the job has failed.
Most jobs should be complete in around 10-15 minutes. BioJupies currently supports concurrent alignment of two jobs at a time. If users requests more than two jobs at the same times, the additional jobs will be added to the queue.
The alignment jobs will continue to run if the user exits the page. The job status will remain accessible from the same unique URL.
Once the alignment jobs are all completed, click Continue to proceed to the next step.
Before analyzing your RNA-seq data, BioJupies asks that you add information that better describes the samples contained within your dataset. Such information can be the experimental group the sample belongs, for example, wild type vs. knock-out, normal vs. diseased.
This information can be used to extract knowledge from the data more efficiently in the subsequent downstream analysis steps. You can add information about your RNA-seq samples in two ways:
  1. By manually specifying groups the samples belong to, using the text boxes on the left.
  2. By uploading a table containing information about the specified samples. The table must have sample names on the rows, and can contain any number of columns specifying different properties of the uploaded samples. An example metadata file can be downloaded here.
Once you have added the required information, click Continue to upload your dataset to BioJupies. You can then proceed to Adding Analysis Plugins.
Chrome Extension
In addition to the BioJupies web server, BioJupies is also freely available for download as a Chrome browser extension. The extension adds functionality to the Geo DataSet browser (https://www.ncbi.nlm.nih.gov/gds), and is available for download from the Google Chrome Store.
The BioJupies Chrome extension embeds buttons and a user interface that enable users to automatically generate notebooks directly from the GEO website. With the Chrome extension users can launch BioJupies notebooks without having to leave the GEO website.
The Chrome Extension can be installed for free from the Chrome Web Store at the following URL: https://chrome.google.com/webstore/detail/biojupies-generator/picalhhlpcjhonibabfigihelpmpadel. To install the extension, click on the Add to Chrome button displayed on the web store. Once installed, the BioJupies icon will be displayed on the top right corner of your Chrome window.
Once the Chrome Extension is installed, it will be activated on the GEO DataSet browser.
The extension adds a button near each RNA-seq dataset. Only studies that have processed RNA-seq data from ARCHS4 are available for analysis using BioJupies.
An example is shown below for the GEO series GSE88741 (https://www.ncbi.nlm.nih.gov/gds/?term=GSE88741). By clicking on the button, users can proceed to Adding Analysis Plugins in order to generate a customized Jupyter Notebook from the selected dataset entirely through a pop-up window user interface.
Note: BioJupies currently supports ~9,000 RNA-seq datasets processed by ARCHS4. Since the Gene Expression Omnibus indexes a large number of datasets which have not been generated using RNA-sequencing, many search results will not have a Generate Notebook button.
For a complete list of processed datasets see https://amp.pharm.mssm.edu/biojupies/analyze/search.
After clicking on the Generate Notebook button, you can add one or more RNA-seq analysis plugin tools using the interface shown below. The selected plugins will be used to analyze the dataset and embed interactive plots in the resulting notebook. You can toggle the inclusion or exclusion of plugins by clicking on each tool's card. Once you have selected the desired set of plugin tools, click on Next .
If you only selected plugin tools from the Exploratory Data Analysis category, proceed to Customizing Parameters; otherwise proceed to Selecting Groups.
If you selected one or more plugins from the Differential Expression, Enrichment Analysis or Small Molecule Query categories, you are asked to specify two groups of samples.
The two groups will be used to perform the differential gene expression analysis, and to identify a gene expression signature from the data. Groups are defined using the interface below: To define the groups of samples:
  1. Select the IDs on the left column to assign samples to Group A.
  2. Select the IDs on the right column to assign samples to Group B.
  3. It is recommended that you renname the groups using the text boxes displayed at the bottom of the window to make the group names more descriptive.
In order to improve the statistical validity of the differential expression analysis, BioJupies requires that each group must contain at least three RNA-seq samples.
Once you selected and labeled the two groups, click Next to proceed to the next step.
After selecting the desired RNA-seq dataset and choosing the analysis plugin tools, you can customize your notebook by changing its title, as well as modify other optional parameters for each of the selected tools. To customize your analysis, click on the button displayed on the right of each plugin name. BioJupies enables you to optionally modify analysis parameters for a subset of RNA-seq analysis plugins.
Once you have selected the desired settings, click on Generate Notebook to generate the analysis.
After clicking on Generate Notebook a notebook generation job is submitted to the BioJupies server, and a loader is displayed in the window. The process of notebook generation typically takes between 10 seconds to 3 minutes, depending on the number of plugins selected and the size of the RNA-seq dataset.
Once the notebook is generated, a link is displayed in the resultant window. To access the report of your analysis, click on Open Notebook. The notebook is permanently stored on the BioJupies web server and can be accessed at any time through the same persistent URL.
Video Tutorials
Introduction to the BioJupies web server and video tutorial playlist.
Step-by-step demonstration on uploading raw FASTQ files and launching alignment jobs from BioJupies.
Step-by-step demonstration on uploading processed RNA-seq data to BioJupies
Step-by-step demonstration on searching and analyzing GEO data from BioJupies.
Instructions on generating a custom Jupyter Notebook using the BioJupies user interface.
Exploring the results contained within a Jupyter Notebook generated using BioJupies.
Overview of how users can download, execute and customize BioJupies Notebooks on their local computer (Mac OS shown). For more information about this topic please refer to the Reusing Notebooks section.
Instructions on installing and using the BioJupies Google Chrome extension. For more information please refer to the Chrome Extension section.
Troubleshooting
This section contains instructions on troubleshooting issues related to BioJupies notebooks.
BioJupies relies on the following software to generate and display notebooks:
  1. Jupyter Notebook (http://jupyter.org/) to create notebook files.
  2. Jupyter nbviewer (http://nbviewer.jupyter.org/) to render notebooks online.
  3. WebGL (https://www.khronos.org/webgl/) to display interactive 2D and 3D visualizations.
Support for WebGL is present in Firefox 4+, Google Chrome 9+, Opera 12+, Safari 5.1+, Internet Explorer 11+, and Microsoft Edge build 10240+. However, the user's device must also have hardware that supports these features.
In some cases, users may experience issues visualizing interactive plots on generated notebooks. These may include:
  • Plots appearing blank in the notebook. This is likely caused by an error loading WebGL on your browser. The issue can be usually resolved by clicking on the empty plot area, refreshing the notebook URL, or opening the URL on a different window or browser.
  • Plots displaying the "WebGL is not supported by your browser" error message. This issue is commonly resolved by updating your browser to the newest available version.
  • Notebooks only displaying partially. This issue may be caused by an issue with the nbviewer notebook rendering service, or by an error loading WebGL on your browser. This is commonly resolved by refreshing the notebook URL, or opening the URL on a different window or browser.
If one or more of these issues persist, follow the instructions below to Generate Static Notebooks.
If you are experiencing problems displaying interactive visualizations, BioJupies provides you with the option to generate notebooks containing static plots. Since static plots do not require WebGL and necessitate less memory to render, this option will allow generated notebooks to be displayed more reliably.
The option can be selected in two ways. First, you can toggle it on individual plots by selecting the Plot Type: 'static' parameter in the final step of notebook generation, as shown below: Second, you can toggle the option for all plots in a notebook by selecting the "Would you like your notebook to only display static plots?" option, as shown below: By selecting this, BioJupies will force all plots to be displayed statically in the final Jupyter Notebook.
Citation
To acknowledge BioJupies in your publications, please use the following reference: https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30432-0.
Terms of Use
Source code for BioJupies is available at https://github.com/MaayanLab/biojupies under the Apache License 2.0.
Provided gene expression files are available under the Creative Commons Attribution 4.0 International License.
BioJupies is free to use for non-commercial purposes. Commercial users should contact MSIP.