Data Submission

Overview

The data submission portal enables laboratories to submit data to the Brain Image Library. The key processes to submit data are:

  1. Create an account.
  2. Log in to the data submission portal.
  3. Create a submission: A submission is a collection of image sets associated with the metadata.
  4. Upload image data: Because this is the time-limiting step, we suggest this step be started before uploading metadata for the submission.
  5. Upload metadata: Submission metadata can be uploaded through the portal in spreadsheet format.
  6. Validate data: Users request to validate the data and correct any errors discovered.
  7. Data is made public: Once data is validated and metadata is curated, the submission is made public.
Data Submission Process

How to submit data

Once you have created your account by creating an ACCESS portal account, requested access to the BIL submission portal, and set your initial password, you are ready to continue to the submission portal.

1. Enter the submission portal

Visit submit.brainimagelibrary.org and enter your PSC username and password.

2. Define your projects (PIs and Data Managers Only)

Only users with PI/Data manager access will have the functions described in this step. If you need access to these functions, ask the PI to send an email to bil-support@psc.edu with your BIL userid and name to request access.

The PI dashboard allows PIs and Data Managers to define and manage projects. This dashboard will allow the PI to add authorized users to the project and see the status of in-progress, unsuccessful, and successful submissions, regardless of the users who uploaded and submitted the data.

You will need to create a project that all your BIL submissions will belong to. Your projects and sub-projects can represent different grants and different experimental projects and are a way for you to organize data being submitted at different times. You will only need to create a project/sub-project once, not every time you submit data to BIL.

Each project at BIL operates based on a “Parent” to “Child” relationship, similar to a family tree. Any project or sub-project will either be a parent or a child of a linked project.

Add/edit projects

To enter (or edit) project information, select the Manage Projects option. From this menu, you can create a new project and view both the personnel and submissions associated with each project.

To define a main project:

  1. Select the Create a New Project button.
  2. In the Project Name field, enter a name for your project For your main project, this name will be your grant.
    • Ex: R24 Ropelewski: A Confocal Fluorescence Microscopy Brain Data Archive
  3. Enter the grant number associated with the project in the Funded By: field.
    • Ex: 5-R24-MH114793-07
  4. Choose the Consortia Affiliation
  5. For your main project, leave the Project Affiliation field empty.
  6. Select Submit New Project

Your main project for your grant will be created. Now, you can create a sub-project for your experiments or other groupings of submissions.

To define a sub-project:

  1. Select the Create a New Project button.
  2. In Project Name enter a name for your sub-project.
    • For BICAN Only: This will become your Data Project name.
  3. Enter the grant number associated with the sub-project in the Funded By: field. This should be the same as the grant for the main project
    • Ex: 5-R24-MH114793-07
  4. Choose the appropriate Consortia Affiliation
    • Options include BICCN, BICAN, and SSPsyGene. Please contact bil-support@psc.edu if you are part of a different consortia that is not listed.
  5. From Project Affiliation, choose the main project/grant that this sub-project is associated with from the drop-down menu.
  6. Select Submit New Project

If the project is a BICAN project, please be sure to tag the project appropriately and make sure that the Project Name field contains the project name for reporting purposes.

Add users to a project

To add a user to the project (such as a data submitter) select the Manage Projects link from the main dashboard then select View Personnel for the project you would like to add the user to. Select the user from the list and select Submit at the bottom of the page.

3. Create a submission

There MUST be a project defined to create a BIL submission. See step 2 above for more details.

A submission contains one or more related datasets and associated metadata. Submissions will inherit project metadata (such as grant), thus all datasets within a submission must belong to the same project.

A submission can contain single or multiple datasets, but smaller submissions are recommended because all datasets within a submission must pass the validation process for the datasets within the submission to be published.

To create a submission, select the Submission sub-option of the New menu.

Next, enter the required metadata associated with the submission (see image below).

When you are done filling out the form, select the Save button. You will then be automatically taken to the metadata submission step.

When created, each BIL submission will have a unique 16-digit identifier associated with it (e.g. 6247417d691a4548) and a unique dropbox-like landing zone directory (e.g. /bil/lz/username/6247417d691a4548. This landing zone directory is where the datasets belonging to the submission must be transferred for validation and ingestion processing.

4. Upload metadata

BICAN ONLY Prerequisites

After the submission is created, you will automatically be taken to the metadata submission page. Alternatively, you can also access the metadata submission page from the submission portal by selecting New and then New Metadata in the drop-down menu.

If you are NOT ready to upload metadata at this time, click Cancel to exit the upload metadata spreadsheet step. When you are ready to load your metadata, return to the portal and select New and then New Metadata in the drop-down menu.

First, you will choose the submission collection from the drop-down menu that you are ready to upload metadata for and Submit.

image

At this point, you can download the metadata sheet from the submit portal. Next, you will select the appropriate metadata schema for your data. Be sure to choose the appropriate option as this will affect the way your metadata is ingested.

image

Metadata upload options

  1. Each dataset was generated from a single sample and donor (e.g. whole brain imaging) on the same instrument. Multiple datasets can be accommodated in a single submission. The lines of the specimen tab in the metadata spreadsheet must match exactly to the lines in the dataset tab (i.e. the specimen listed in specimen tab row 5 is the specimen for dataset row 5). Each dataset tab should have a corresponding image tab. There should be exactly one line in the instrument tab.
    image

  2. The dataset was generated from multiple samples, specimens, or donors on the same instrument. Only a single dataset can be accommodated per submission. Multiple specimens should be listed in the metadata spreadsheet, but only a single dataset. There should be one line in the image tab. There should be exactly one line in the instrument tab.
    image

  3. All datasets were generated from one specimen, but are from unique regions of interest on the specimen. Multiple datasets can be accommodated in a single submission. List the specimen multiple times in the specimen tab. Identify the specific region of interest in the locations column. Multiple donor lines must match exactly in the metadata spreadsheet (i.e. the specimen listed in specimen tab row 5 is the specimen for dataset row 5). Each dataset tab should have a corresponding image tab. There should be exactly one line in the instrument tab.
    image

  4. All datasets were generated from one specimen – but used different experimental (machinery) parameters. Multiple datasets can be accommodated as a single submission. There should be exactly one entry in the specimen tab. The lines listed in the Instrument tab in the metadata spreadsheet must match exactly to the lines in the dataset tab (i.e. the instrument listed in instrument tab row 5 is the instrument for dataset row 5). Each dataset tab should have a corresponding image tab.
    image

  5. Reconstructed neurons in standardized SWC format. All reconstructions were generated from one specimen. There should be one line listed in the specimen tab. There should be one line listed in the dataset tab. No information should be listed in the image tab. File format specification: http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html
    image

After a metadata spreadsheet is prepared, select the submission it will be associated with. Then select the Upload Metadata button to select your metadata spreadsheet and upload it. This button will not be available until a metadata upload option is selected.

Note that while preparing the metadata spreadsheet, separate data directories need to be listed for each dataset in the submission.

BICAN Identifiers

If your project is part of the BICAN consortium, after you have uploaded your metadata, you will be prompted to attach NHASH IDs from the NIMP Specimen Portal so that your data can be easily linked to the tissues and specimens within the consortium.

After you select the Upload Metadata button, you will be prompted to provide BICAN Tissue NHASH Identifiers corresponding specimen information uploaded in your metadata. You can either 1. upload a specimen spreadsheet with the NHASH identifier or 2. enter the NHASH identifier directly in the portal.

Once the identifiers are uploaded or entered, select Save BICAN IDs at the bottom of the page.

The next page will ask you to confirm the NHASH Results. Review the information, and after you have ensured it is correct, select Confirm at the bottom of the page.

5. Upload image data

Currently supported file formats include: TIFF, OME-TIFF, and OME-Zarr

The submission portal creates an upload landing zone space for you to transfer your image data to. To find this landing zone, select the Submissions sub-option of the View menu. The field "Data Path" shows the landing zone space for the submission.

Note that separate data subdirectories are required for each dataset in the submission. For example, if you had an experiment containing 5 mouse datasets that you wanted to to include as a single submission, you would create 5 subdirectories in the landing zone for the submission, one for each mouse dataset. e.g:

/bil/lz/testuser/abcdef0123456789/mouse1
/bil/lz/testuser/abcdef0123456789/mouse2
/bil/lz/testuser/abcdef0123456789/mouse3
/bil/lz/testuser/abcdef0123456789/mouse4
/bil/lz/testuser/abcdef0123456789/mouse5

To tie metadata to an image dataset, each image dataset must be uploaded in a separate subdirectory.

Due to size, image data can not be uploaded through the submission portal (submit.brainimagelibrary.org). It must be uploaded separately through the BIL data transfer nodes, which are available at the virtual host: upload.brainimagelibrary.org. All users authorized to use the data submission portal (submit.brainimagelibrary.org) are also authorized to use the data transfer nodes (upload.brainimagelibrary.org). The username and passwords are the same on both systems.

Data Upload Methods

There are many supported ways to upload files into the landing zone directory through the BIL data transfer nodes including rsync, Globus, sftp, and scp.

rsync

An example of uploading all data in the directory (mouse1) as user testuser through the data transfer node upload.brainimagelibrary.org to the submission landing zone (abcdef0123456789) is shown below:

$ rsync -lrtpDvP mouse1 testuser@upload.brainimagelibrary.org:/bil/lz/testuser/abcdef0123456789
sending incremental file list
mouse1/
mouse1/data1.tiff
     1356122 100%  126.20MB/s    0:00:00 (xfer#1, to-check=0/2)

sent 1356392 bytes  received 35 bytes  2712854.00 bytes/sec
total size is 1356122  speedup is 1.00

Globus

To upload to BIL via Globus, we will need your CILogon ePPN. To set up Globus for the first time:

  1. Find your CILogon ePPN by logging in with your preferred identity at https://cilogon.org/.
  2. Send an e-mail to bil-support@psc.edu with your name, institution, and the value of the ePPN field under "User Attributes" from the CILogon website.
  3. Once your ePPN has been mapped to your BIL/PSC username, you will be able to log in via the Globus website

Once your ePPN has been mapped to BIL user, you can upload via Globus:

  1. Login at https://globus.org/ - select the LOG IN button at the top-right to begin.

  2. From the drop-down list, find the organization for the identity associated with the ePPN we have mapped for you. This could either be ACCESS (XSEDE), or your institution (e.g. Carnegie Mellon University). After you select the organization for the identity that you want to authenticate with, select the Continue button to proceed.

  3. You will be redirected to your organization's login page, where you enter the account name, password, and possibly also complete an additional two-factor authentication (e.g. DUO) to log in.

  4. Upon successful authentication to your organization, your browser will show the Globus File Manager page where you can connect to the Brain Image Library service.
    Click in the field labeled "Collection" (the Globus term for a remote data site), and start typing "Brain Image Library". This should bring up the new Globus collection labeled "Brain Image Library /bil filesystem" on the Brain Image Library Globus Connect Server 5 Endpoint. Select it to proceed.

  5. The first time that you connect to this Globus collection, you may be asked to provide consent to use your identity to authenticate to it. Select "Continue" to proceed.

  6. There may be more than one identity mapped to your BIL/PSC username. Select the identity that you used to login to Globus with, and then select "Continue" to proceed.

  7. The Globus service will then ask you to allow the service to use your identity to facilitate activities including file transfers, file management, and remote access to data. Select "Allow" to proceed.

  8. Once you have allowed the Globus service to employ your identity to authenticate you to the BIL Globus service, you will be connected and arrive at the Path /bil. You can navigate with clicks or by typing the path to your desired location in the Path field.
    Globus path to /bil/lz

  9. When you are done with your Globus session, we recommend logging out completely. This is a two-step process: Select the LOGOUT icon toward the bottom of the left sidebar. Clicking on the LOGOUT icon logs you out of the Globus file manager web app. To logout from the Globus service, select the Globus ID "Log Out" link.

If you require assistance in accessing the BIL Globus service, please reach out to bil-support@psc.edu.

sftp

Using sftp to upload.brainimagelibrary.org: An example of logging into the data transfer node as testuser is shown below. The first cd command is used to move to the BIL landing zone for the submission. The mkdir command is used to create a sub-directory (called mouse1 in the landing zone, while the second cd command is used to move to this subdirectory. Finally, the put command is used to upload data - in this case, a single file (data1.tiff):
$ sftp testuser@upload.brainimagelibrary.org
The authenticity of host 'upload.brainimagelibrary.org (128.182.108.164)' can't be established.
ECDSA key fingerprint is 32:cf:46:44:3d:9c:8e:b2:1d:14:03:66:45:0b:11:29.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'upload.brainimagelibrary.org,128.182.108.164' (ECDSA) to the list of known hosts.
testuser@upload.brainimagelibrary.org's password:
Connected to upload.brainimagelibrary.org.
sftp> cd /bil/lz/testuser/abcdef0123456789
sftp> mkdir /bil/lz/testuser/abcdef0123456789/mouse1
sftp> cd /bil/lz/testuser/abcdef0123456789/mouse1

sftp> put data1.tiff
Uploading data1.tiff to /bil/lz/testuser/abcdef0123456789/mouse1/data1.tiff
data1.tiff                                      0%    0     0.0KB/s   --:-- ETA
data1.tiff                                    100% 1324KB   1.3MB/s   00:00

sftp> exit

Due to their ability to resume interrupted transfers, the use of rsync and Globus is recommended over sftp.

scp

Using scp to upload.brainimagelibrary.org: An example of uploading data (data1.tiff) to the landing zone as testuser is shown below:
$ scp data1.tiff testuser@upload.brainimagelibrary.org:/bil/lz/abcdef0123456789/mouse1/data1.tiff
testuser@upload.brainimagelibrary.org's password:
data1.tiff                                                                      0%    0     0.0KB/s   --:-- ETA
data1.tiff                                                                    100% 1324KB   1.3MB/s   00:00

Due to their ability to resume interrupted transfers, the use of rsync and Globus is recommended over scp.

rclone

For uploading data stored on Google Drive, you have the option to use rclone to transfer data directly. An example of using rclone to transfer data to the landing zone is shown below:

rclone copy your_bucket:data_directory/file_name.zarr brainimagelibrary:/bil/lz/testuser/abcdef0123456789/ffile_name.zarr -P

5. Validate and submit publish request

Once all data has been uploaded to the landing zone and the metadata has been submitted, request that the data be validated and made publicly available. This can be done through the submission portal by selecting Submit Publish Request from the main menu of the submission portal. If an embargo period is being requested, please send an email to bil-support@psc.edu along with the submission id and brief note.

To submit your datasets for publication:

  1. Log in to the submission portal https://submit.brainimagelibrary.org/
  2. On the top navigation bar, select "Submit Publish Request"
  3. Select the submission you would like to publish from the list by selecting the check box on the left
  4. Select "Submit Validation Request"

If data validation fails, the data submitter will be notified by email. The submitter should address the validation issue(s) and re-submit the validation request. Both data and metadata need to pass the validation checks datasets that fail the validation process are considered incomplete and will not be made publicly available.

Glossary

The following terms in this document have specific context, as defined below:

Dataset
At BIL a dataset is a stand-alone entry of an image-volume or image-set associated with a single subject or experimental unit with unique metadata. A single dataset is usually associated with a single donor or subject when submitting multiple subjects in a submission, or a single part of the brain when imaging many parts of the brain in a submission. Many datasets make up a submission. A dataset typically contains many 2d image files that are assembled to form a more complex two or three-dimensional volume.

Submission
A submission contains one or more related datasets and the associated metadata. Submissions will inherit project metadata (such as the NIH project, grant number, laboratory name, etc.), thus all datasets within a submission must belong to the same project. A submission can contain a single or multiple datasets. In general, smaller submissions are recommended because all datasets within a submission must pass the validation process for the datasets within the submission to be published. Each "level" of data should be uploaded in separate submissions (e.g. The set of raw data and the same data aligned to a reference are considered two separate submissions).

Project
A way of grouping submissions to ensure that (1) People working with the data gain appropriate access and credit for their contributions and (2) Data is linked to the proper funding sources.

Sub-Project
A way of grouping data within a BIL Project to link data of the same funding source, and to group data from the same emperiment or scientific project.

Publish
The act of making a submission that has passed all validation checks publicly available.