CRDC, NCI Cloud Resources, and Data Nodes

The goal of the National Cancer Institute’s Cancer Research Data Commons (CRDC) is to empower researchers to accelerate data-driven scientific discovery by connecting diverse datasets with analytical tools in the cloud. The CRDC is built upon an expandable data science infrastructure that provides secure access to many different data across scientific domains via Data Commons Framework. The CRDC enables users to search and aggregate data across repositories via the Cancer Data Aggregator using a common data model, CRDC-H. The ability to combine diverse data types and perform cross-domain analysis of large cancer datasets can lead to new discoveries in cancer prevention, treatment and diagnosis, further supporting the goals of precision medicine and the Cancer Moonshot℠. The CRDC will encompass and connect multiple cloud-based data repositories and serve as a central location to support public data sharing for NCI-funded programs.
https://datacommons.cancer.gov/cancer-research-data-commons

Figure 1: NCI Cancer Research Data Commons (CRDC website, https://datascience.cancer.gov/data-commons)

Figure 1: NCI Cancer Research Data Commons (CRDC website, https://datacommons.cancer.gov/cancer-research-data-commons)

National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) provides an ecosystem that enables access to many NCI funded programs including through interconnected infrastructures like the below (text from CRDC):


FireCloud Powered by Terra

“Broad Institute FireCloud is a NCI Cloud Resource project powered by Terra for biomedical researchers to access data, run analysis tools, and collaborate.”
FireCloud: https://firecloud.terra.bio
Resource Portal: https://firecloud.terra.bio


Institute for Systems Biology ISB Cloud

“The ISB Cancer Genomics Cloud is democratizing access to NCI Cancer Data (TCGA, TARGET, CCLE) and coupling it with unprecedented computational power to allow researchers to explore and analyze this vast data-space.”
ISB-CGC: https://portal.isb-cgc.org
Resource Portal: https://portal.isb-cgc.org


Seven Bridges Cancer Genomics Cloud

“The Cancer Genomics Cloud (CGC), powered by Seven Bridges, is one of three systems funded by the National Cancer Institute to explore the paradigm of colocalizing massive public datasets, like The Cancer Genomics Atlas (TCGA), alongside secure and scalable computational resources to analyze them. The CGC makes more than two petabytes of multi-dimensional data available immediately to authorized researchers. You can add your own data to analyze alongside the public datasets using predefined analytical workflows or your own tools. Every execution is fully reproducible, and collaborating with your team is simple and secure.”
Seven Bridges CGC: http://www.cancergenomicscloud.org/
Resource Portal: https://cgc-accounts.sbgenomics.com/