Frequently Asked Questions

fence.png

Data Access

1. What authentication and authorization options are available within NCI DCFS?

Authentication options include eRA Commons IDs for controlled data, NCI DCFS, and individual OIDC platform authentication. Authorization is available though dbGaP access, DCFS, or/and Authorization enabled by Trusted Partnerships with NIH

2. How do users of Cloud Resources access a Data Node’s data?

Please refer to the following instructions.

3. How do Data Node users authenticate and gain authorize through NCI DCFS?

Please refer to the following instructions.


indexD-1.png

Indexing

1. How does a Data Node’s data get indexed?

To index data, Data Node will prepare the manifest file with GUIDs and share it with DCFS. Please see the manifest format described here. DCFS will index the data as per the manifest shared by the data node. If data nodes don’t have unique ids, DCFS will create GUIDs. DCFS will index the data in the buckets, and objects registered in Indexd service will be available to request pre-signed URLs by authorized users.

2. What happens if our data node already has generated data GUIDs?

If Data Node has already generated data GUIDs, DCFS will add prefix/namespace to them. If Data Node has not generated data GUIDs for data objects, DCFS will generate GUIDs for them.

3. What is the format of the manifest Data Node provides to DCFS?

Please refer to the following manifest format.


fence-1.png

Testing ACCESS

1. How can I test connections between a Data Node and a Cloud Resource?

Data from the Data Node buckets will be available to the cloud resources through the Fence URL. For testing purposes, DCFS will create an OIDC client in testing(staging) environment for Data Node. DCFS will share client_id and client_secret with Data Node to connect with DCFS staging environment and will also send info on endpoints for pre-signed URLs for Data Nodes to test. With this information, Data Node can test the OIDC flow and get the signed URLs.

2. Can we have a whitelist of users for testing purposes in staging?

Yes, please provide us a list of users, and we will create an access for them. We will create a GitHub repository with user.yaml file, and your group can manage this file to add/delete users.

3. Our data has been indexed, and we want to access it through DCFS; what is Fence URL, and how we can access data objects?

Fence URL remains the same as for GDC data. GUIDs will have a prefix following by your generated UUID in the format dg.prefix/UUID (for example dg.4503/00e6cfa9-a183-42f6-bb44-b70347106bbe).

4. Where can we find more information about Fence API?

Please refer to Fence OpenAPI Specification.


NCI-CRDC-white-crop-more-2.png

General Questions

1. What is Gen3, and how does NCI DCFS relate to Gen3?

Gen3 is an open-source platform, collection of microservices, that enables the standing-up of data commons for managing, analyzing, and sharing research data. NCI DCFS utilizes Indexd microservice for data indexing and Fence microservice for authentication and authorization. Open APIs allow microservices to interact with each other and external users.

2. Can a data commons outside of CRDC interoperate through NCI DCFS?

A data commons outside of CRDC can call our APIs. A supplementary process required if Data Commons wishes to interact on behalf of a user, the procedure includes security and compliance assurances signed by Data Node and setting up Data Node as OIDC client.

3. What data are available through DCFS, and how do I access them?

23+ datasets currently available covering genomics, proteomics, imaging, and more. Some data are present in both AWS and GCP, some present only on one cloud infrastructure. New data is added through existing and new Data Nodes continually. Please check the full list for more detailed information.