ARCHS4 Downloads


This section contains all files created for the ARCHS4 website. The methods are described at . For help in accessing the files refer to the Help section or contact us directly. The database will be updated on a regular basis and old versions of the files will be accessible.

If you would like to receive updates on the ARCHS4 data and stay informed about new data releases consider signing up for the newsletter.

Expression (gene level)

Human

Mouse

Expression files for mouse and human in HDF5 format. All gene counts are on gene level (Entrez Gene Symbol). For compression purposes the Kallisto pseudocounts are rounded to integer values.
human_matrix.h5 v6
Date: 9/2018
Size: 4.0 GB
R human_matrix.rda v6
Date: 11/2018
Size: 4.0 GB
mouse_matrix.h5 v6
Date: 9/2018
Size: 4.1 GB
R mouse_matrix.rda v6
Date: 11/2018
Size: 4.0 GB

Expression (transcript level)

Human

Mouse

Expression files for mouse and human in HDF5 format. All measurements are at the transcript level (Ensembl ID). For compression purposes the Kallisto pseudocounts are rounded to integer values.
human_transcript.h5
Date: 9/2018
Size: 14.5 GB
mouse_transcript.h5
Date: 9/2018
Size: 10.1 GB

TPM (transcript level)

Human

Mouse

Expression files for mouse and human in HDF5 format. All measurements are at the transcript level (Ensembl ID). The files are very large and values are not rounded.
human_transcript_tpm.h5
Date: 9/2018
Size: 52.0 GB
mouse_transcript_tpm.h5
Date: 9/2018
Size: 36.0 GB

Expression (GCTx)

Human

Mouse

Expression files for human in GCTx format (BROAD). All measurements are at the gene level level (Entrez Gene Symbol). For more information about the format and software packages refer to this bioRxiv paper.
human_eid_1.0.gctx
Date: 2/2018
Size: 10.7 GB

t-SNE sample coordinates

Human

Mouse

Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing dimensions x, y, z and the last column containing the numeric part of the GSM id (GSM123456 -> 123456).
sample_human_tsne.csv v2
Date: 3/2018
Size: 2.9 MB
sample_mouse_tsne.csv v2
Date: 3/2018
Size: 3.5 MB

t-SNE gene coordinates

Human

Mouse

Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing dimensions x, y, z and the last column containing Entrez gene symbol.
gene_human_tsne.csv v2
Date: 3/2018
Size: 741.0 KB
gene_mouse_tsne.csv v2
Date: 3/2018
Size: 606.0 KB

GSM meta information

Human

Mouse

Meta information retrieved from GEO for GSM samples. The data is stored in an RDS file that can be loaded into memory in R. The data is stored in a list object.
human_gsm_meta.rda v4
Date: 6/2018
Size: 8.0 MB
mouse_gsm_meta.rda v4
Date: 6/2018
Size: 10.7 MB

Gene correlation

Human

Mouse

Pairwise pearson correlation of genes across expression samples.
human_correlation.rda v1.0
Date: 10/2017
Size: 5.0 GB
mouse_correlation.rda v1.0
Date: 8/2017
Size: 3.0 GB

JL transfomed expression

Human

Mouse

Gene expression compressed with the Johnson-Lindenstrauss transformation. The RDA files can be loaded into a running R environment with the "load" command. The files create two variables, the transform matrix used for the projection and the jl_expression matrix. The original dimensions are reduced to 1000. The original distances and correlations of the samples should be preserved.
compressed_human_1000.rda v2
Date: 3/2018
Size: 1.0 GB
compressed_mouse_1000.rda v2
Date: 3/2018
Size: 1.19 GB

Kallisto index files

Human

Mouse

Kallisto index files used for the alignment process. The index files where build using the Ensembl annotation version 90 and reference cDNA Homo_sapiens.GRCh38.cdna.all.fa.gz and Mus_musculus.GRCm38.cdna.all.fa.gz.
human_index.idx v1
Date: 6/2018
Size: 2.2 GB
mouse_index.idx v1
Date: 6/2018
Size: 1.8 GB

recount2 expression

GTEx

TCGA

Gene counts from GTEx and TCGA from the recount2 project. The reads for these samples was aligned with a different pipeline resulting in significant differences to the ARCHS4 gene expression. Genes that did not overlap with the genes in the ARCHS4 data were removed.
gtex_matrix.h5 recount2
Date: 10/2017
Size: 589.5 MB
tcga_matrix.h5 recount2
Date: 10/2017
Size: 696.9 MB

GitHub repository





The scripts used to process the ARCHS4 data are located at the link below. The project is not easily adapted at the current state. We are working on making the software more accessible in the future.

https://github.com/MaayanLab/archs4



© Ma'ayan Lab.