NCI Data Catalog

  • Resize font
  • Print
  • Email
  • Facebook
  • Twitter
  • Google+
  • Pinterest

The NCI Data Catalog is a listing of data collections produced by major NCI initiatives and other widely used data sets. Data collections included in the catalog meet the following criteria:

  • Produced by NCI intramural researchers or major NCI initiatives
  • Regularly referenced NCI funded extramural research data
  • Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee) 
  • Well-documented and available for download

This is not a comprehensive listing of data sets available from NCI—we will be adding to this list.

NCI DATA CATALOG: Data collections produced by major NCI initiatives and other widely used data sets.
Category Data Collection Name Description
Animal Models cancer Models Database, caMOD

The cancer Models Database, caMOD, allows researchers to find information about animal models. The database includes:

  • Model characteristics
  • Genetic description
  • Histopathology
  • Derived cell lines
  • Associated images
  • Carcinogenic agents
  • Therapeutic trials
Cancer Screening Trial Cancer Data Access System (CDAS)

CDAS is a submission and tracking system for the data from the National Lung Screening Trial (NLST) and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Trial data includes:

  • Summary of the trial
  • Description of the data collected
  • Searchable list of research projects and publications
Drug Discovery NCI panel of 60 Human Tumor Cell Lines

Gene expression 1

Gene expression 2

NCI-60 is a panel of 60 diverse human cancer cell lines used by the NCI Developmental Therapeutics Program to screen over 100,000 chemical compounds and natural products (since 1990). 

  • NCI-60 dataset is available for analysis in CellMiner
  • Gene expression data files can be downloaded from an NCI-hosted FTP site
Epidemiology  Surveillance, Epidemiology and End Results (SEER) database

SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 28 percent of the U.S. population. 

The SEER database includes incidence and population data associated by:

  • Age
  • Sex
  • Race
  • Year of diagnosis
  • Geographic areas
Genomics Cancer Genome Characterization Initiative (CGCI)

CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles are used to inform better cancer diagnosis and treatment. 

  • CGCI data are available through the project data matrix
Genomics Cancer Genome Workbench (CGWB)

CGWB hosts data from a number of projects, including TCGA, TARGET, COSMIC, GSK, NCI60 including:

  • Mutation
  • Copy number
  • Expression
  • Methylation 

CGWB offers  tools for visualizing sample-level genomic and transcription alterations in various cancers.

Genomics Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1


The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer. 

  • Gene expression data files can be downloaded from an NCI-hosted FTP site
Genomics Molecular Targets for Cancer

Thousands of molecular targets have been measured in the NCI panel of 60 human tumor cell lines. You can search for a target of interest or you may browse through a list of targets.

Measurements include:

  • Protein levels
  • RNA measurements
  • Mutation status
  • Enzyme activity levels
Genomics NCI Brain Neoplasia Data

Gene expression


NCI Brain Neoplasia Data integrates clinical and functional genomics data from clinical trials involving brain tumor patients and provides the ability to perform ad hoc querying, reporting and analysis across multiple data domains, including gene expression, gene copy number and clinical data.  

  • Data is available for analysis in the Georgetown Database of Cancer (G-DOC)
  • Gene expression  files can be downloaded from a NCI-hosted FTP site
Genomics TARGET: Therapeutically Applicable Research to Generate Effective Treatments

TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal  is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications. 

The TARGET data matrix includes:

  • Genomic data
  • Clinical information that cannot be used to identify patients
Genomics The Cancer Genome Atlas (TCGA)

The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. The TCGA Data Portal provides a platform for researchers to search, download, and analyze data from over 30 different types of cancer. It contains:

  • Clinical information that cannot be used to identify patients
  • Genomic characterization data
  • High level sequence analysis of the tumor genomes
Genomics The NCI Director's Challenge Adenocarcinoma Lung Study 

Gene expression

A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) could be used to predict overall survival in lung cancer subjects.  

  • DC Lung Study dataset is available for analysis in Georgetown Database of Cancer (G-DOC)
  • Gene expression data files can be downloaded from a NCI-hosted FTP site
Nanomaterial Characterizations caNanoLab

caNanoLab includes over 1000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications

  • Researchers can perform web-based queries and download reports for re-use and additional analysis
Pathways Pathway Interaction Database (PID)

PID is a collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes.

  • Researchers can use a range of search features that facilitate pathway exploration and reporting
Proteomics The Clinical Proteomic Tumor Analysis Consortium (CPTAC) 

The Clinical Proteomic Tumor Analysis Consortium analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. 

  • The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs)
Target Discovery Cancer Target Discovery and Development (CTD2)

CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in computational and functional genomics approaches critical for translating next-generation sequencing data to clinical applications.

  • CTD2 data are available through the project data matrix
Biospecimens Biospecimen Research Database (BRD)

BRD is a free publicly accessible literature database that contains peer-reviewed primary and review articles as well as Standard Operation Procedures (SOPs) in the field of human biospecimen science. 

Each literature curation captures the following relevant parameters:

  • Biospecimen investigated, analyte(s) of interest, and technology platforms employed
  • Pre-analytical factors investigated
  • An original summary of relevant results

SOPs are organized in a system that includes SOPs and Biospecimen Evidence Based Practices (BEBP). 


Imaging The Cancer Imaging Archive (TCIA)

TCIA is a curated archive of medical images accessible for public download and includes the data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA).  Data are divided into Collections grouped by common cancer types or research aims.  Users can also search these collections by modality, anatomic location, or various acquisition parameters.  Pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other supporting data are also provided where available.


  • Updated: March 9, 2015