Getting Started

About Datasets2Tools

1. Schematic overview of the Datasets2Tools pipeline.
Datasets2Tools is a platform for the discovery and evaluation of biomedical digital objects. It indexes over 30,000 bioinformatics analyses, over 6,000 biological datasets and over 4,000 computational tools. These include a wide variety of enrichment analyses, gene interaction networks, interactive data visualizations; datasets and computational tools spanning technologies such as microarray, RNA-seq, proteomics, and many others. Users can search these objects by using the interactive search interfaces, the Google Chrome extension, and an API.
The Canned Analysis

2. Structure of the canned analyses digital object.
The canned analysis is a new type of digital object designed to store the results of bioinformatics analyses in a findable, accessible, interoperable and reusable manner. It is defined by three sets of features:
  • Core elements:
    • Analysis Title, a brief description of the analysis.
    • Analysis Description, a longer explanation of the analysis aims and methods.
    • Analysis URL, a link to an external webpage which stores the results of the analysis.
  • Associated objects:
    • Dataset(s) used to generate the analysis, specified by accession.
    • Tools(s) used to analyze the data, specified by name.
  • Annotations:
    • Keywords, an unstructured set of tags related to the analysis.
    • Metadata Tags, a structured set of key-value pairs related to the analysis.
Canned Analysis Examples

3. Example of analyses which can be generated from an individual RNA-seq dataset.
The canned analysis is suitable to store information about a wide variety of bioinformatics analyses, for example:
  • Interactive visualizations of different types of data, including transcriptomics, proteomics and genomics (e.g. DCA00000037).
  • Enrichment analyses of genes upregulated and downregulated in diseases, small molecule perturbations, gene knockdown experiments (e.g. DCA00059591, DCA00059591).
  • Interaction networks of genesets (e.g. DCA00033271, DCA00033272).
  • Signature queries to identify small molecules which mimic or reverse gene expression signatures of diseases, small molecule perturbations, gene knockdown experiments. (e.g. DCA00009676, DCA00009677).


4. Screenshot of the analysis search interface.
The analysis search interface allows to search canned analyses using text-based search and a variety of filtering options:
  • The search bar performs a text-based query on the canned analysis title, description, and the datasets and tools associated to it.
  • The Dataset Accession filter allows to filter canned analyses generated by the desired dataset.
  • The Tool filter allows to filter canned analyses generated using the desired tool.
  • The additional metadata filters allow to filter analyses based on various metadata tags, such as the direction of the geneset or the organism.
5. Overview of the canned analysis card.
The canned analysis card displays summary information about the analysis, including its title, description, and name of the tool(s) used to generate it. More information on the FAIRness Insignia is available here.

6. Screenshot of the dataset search interface.
The dataset search interface allows to search canned analyses using text-based search and sort them by a variety of metrics:
  • The search bar performs a text-based query on the dataset title, description, and the repository of origin.
  • The sorting options allow to sort datasets by the number of associated analyses and the results of FAIR evaluations.
7. Overview of the dataset card.
The dataset card displays summary information about the dataset, including its title, description, and number of analyses it has been used to generate. More information on the FAIRness Insignia is available here.

The tool search interface allows to search tools using text-based search and a variety of sorting options:
  • The search bar performs a text-based query on the tool name, its description, and the title, abstract and authors of the publications associated to it.
  • The sorting options allow to sort tools by the number of associated analyses, the results of FAIR evaluations, and several metrics related to the publications associated to it, such as number of citations, Altmetric Attention Scores and PlumX Metrics.
8. Overview of the tool card.
The tool card displays summary information about the tool and the publications associated to it.
  • The name and description of the tool.
  • The number of citations from PubMed.
  • The Altmetric Attention Score and badge. By hovering over it, the user can reveal detailed information about the metrics of the publications associated to the tool.
  • The PlumX Metrics badge. By hovering over it, the user can reveal detailed information about the metrics of the publications associated to the tool.
  • The FAIR Insignia. More information available here.
Landing Pages

Analysis Pages

9. Canned analysis landing page.
The canned analysis landing pages are divided in the following sections:
  • The Overview tab, which contains general information about the analysis, the annotations¬†associated to it, and similar canned analyses.
  • The Datasets tab, which displays the datasets used to generate the analysis.
  • The Tools tab, which displays the tools used to analyze the data.
  • The FAIR Metrics tab, which contains the FAIR evaluation form (more information here). The evaluation form is only available to registered users.
Dataset Pages

9. Dataset landing page.
The dataset landing pages are divided in the following sections:
  • The Overview tab, which contains general information about the dataset, its repository, and similar datasets.
  • The Tools tab, which displays the tools which have been used to generate canned analyses using the data.
  • The Analyses tab, which displays the canned analyses generated using the dataset.
  • The FAIR Metrics tab, which contains the FAIR evaluation form (more information here). The evaluation form is only available to registered users.
Tool Pages

10. Tool landing page.
The tool landing pages are divided in the following sections:
  • The Overview tab, which contains general information about the analysis, the publications to it, and similar tools.
  • The Datasets tab, which displays the datasets which have been analyzed by the tool to generate canned analyses.
  • The Analyses tab, which displays the canned analyses generated using the tool.
  • The FAIR Metrics tab, which contains the FAIR evaluation form (more information here). The evaluation form is only available to registered users.
FAIR Evaluations

FAIR Principles

The FAIR Principles are a concise set of high-level principles describing the characteristics that digital objects should possess in order to be findable, accessible, interoperable and reusable. These principles have been described in the following publication by Wilkinson et al., https://www.nature.com/articles/sdata201618.
Evaluating Objects

11. Canned analysis evaluation form.
The FAIRness evaluation forms allow users to evaluate digital objects by answering a set of nine yes/no questions which describe properties related to the object's findability, accessibility, interoperability and reusability. Canned analyses, datasets and tools each have a different set of questions which are designed to address properties specific to each object type.
FAIRness Insignia

12. Expanded evaluation insignia.
The expanded evaluation insignia displays a summary of the FAIR evaluations as a grid. It is available on the landing pages of analyses, datasets and tools. Each grid square corresponds to one question, and the color represents the scores associated to each question. The color ranges from blue (100% positive answers) to red (100% negative answers). Users can visualize the question and browse the scores by hovering over each grid element.
13. Compact evaluation insignia.
The compact insignia displays a summary of the FAIR evaluations as a round badge. It represents the number of questions to which the majority of users have answered positively, and ranges from 0 to 9. It is available on the search results pages of analyses, datasets and tools. By hovering over the badge, users can visualize the results of evaluations for individual questions.
Contribute

How to contribute

Users can contribute to the Datasets2Tools database by uploading their analyses from the Contribute page. Contributions are made by uploading an Excel sheet which contains information about canned analyses in each row. A template of the Excel sheet can be downloaded here.
Chrome Extension

What does the Chrome Extension do?

13. Screenshot of the Chrome Extension on a GEO Search page (link).
The Datasets2Tools Chrome Extension enriches websites of two biomedical data repositores, the Gene Expression Omnibus and DataMed, by adding interfaces containing links to analyses run on the datasets indexed by these resources. These links are added by the search results and on the dataset landing pages of each dataset for which Datasets2Tools has indexed at least one canned analysis. This helps users extract information more efficiently when interfacing with these repositories.
Installing the Chrome Extension

14. Screenshot of the Datasets2Tools page on the Chrome Web Store (link).
The Datasets2Tools Chrome Extension can be installed for free on the Chrome Web store at the following link: https://chrome.google.com/webstore/detail/datasets2tools/fbamphimpljabaailcaidmeegpcpkdel. The extension can be only installed by users of the Google Chrome web browser.