Brain Image Library: Data Submission Best Practices

Data Submission Best Practices

This document outlines a few recommended best practices to structure your data uploads to the Brain Image Library. If you have questions on how to structure your data that is not answered here, please contact the help desk (bil-support@psc.edu).

General Practices

Perform general file cleanup after data transfer

Be aware that there may be operating system created hidden files transferred along with your data that should be removed from the BIL landing-zone directory prior to making your data public. These files may include:

.DS_Store -- An Apple macOS file (for more information see: https://en.wikipedia.org/wiki/.DS_Store)

Thumbs.db -- A Microsoft Windows file (for more information see https://en.wikipedia.org/wiki/Windows_thumbnail_cache)

Avoid problematic special characters:

Use characters that are generally safe in your filenames. Avoid special characters in filenames that can be problematic on a unix/linux system.

Characters that are generally safe to include in filenames:

Underscore

Dash -

Period .

Lower-case letters a-z

Upper-case letters A-Z

Numbers 0-9

Special characters that can be problematic:

Space

Ampersand &

Slash /

Vertical bar |

Colon :

Semicolon ;

Greater-than >

Less-than <

Question ?

Asterisk *

Only place subdirectories in the root of the landing-zone

The submission portal will create a landing-zone directory for you to upload your data to. Please do not place files in this root directory, but rather create one or more subdirectories here for each dataset that you are uploading. Then upload each dataset under the subdirectories that you have created. Note that the complete path to the directory that you have created should be entered in the metadata spreadsheet along with the associated metadata for each dataset.

Modality-Specific Recommendations

Below are modality-specific recommendtions. If your exact data type is not listed, you can model your submission with the closest type or contact bil-support@psc.edu for more information.

Whole brain datasets containing only lower-level data

To tie metadata to an image dataset (a stack of z-planes), each image dataset must be uploaded in a separate subdirectory. For example, if you had an experiment containing 5 mouse datasets (with sample names of mouse1, mouse2, mouse3, mouse4, mouse5) that you wanted to to include as a single submission collection, you would create 5 subdirectories in the landing zone for the submission collection, one for each mouse dataset. e.g:

/bil/lz/testuser/abcdef0123456789/mouse1
/bil/lz/testuser/abcdef0123456789/mouse2
/bil/lz/testuser/abcdef0123456789/mouse3
/bil/lz/testuser/abcdef0123456789/mouse4
/bil/lz/testuser/abcdef0123456789/mouse5

If you have additional (non BIL required) metadata that you want to include with each dataset, place it in a subdirectory called extras inside each subdirectory:

/bil/lz/testuser/abcdef0123456789/mouse1/extras

Whole brain data containing a mix of lower-level and higher level data

If the dataset contains a mix of lower-level and higher level data, note that in our system, each level of data is considered a unique dataset. We recommend that you follow the general advice above for Whole brain datasets containing only lower-level data. For example, if you had an experiment containing 5 mouse datasets (with sample names of mouse1, mouse2, mouse3, mouse4, mouse5) that you wanted to to include as a single submission collection, you would create 5 subdirectories in the landing zone for the submission collection, one for each mouse dataset. e.g:

/bil/lz/testuser/abcdef0123456789/mouse1
/bil/lz/testuser/abcdef0123456789/mouse2
/bil/lz/testuser/abcdef0123456789/mouse3
/bil/lz/testuser/abcdef0123456789/mouse4
/bil/lz/testuser/abcdef0123456789/mouse5

Next, create subdirectories that indicate the level of processing done on the data. Recommended subdirectory names to use would include: raw, unstitched, stitched, aligned, tracings, masks. If multiple-resolutions are provided, use names indicative of the resolution (e.g. 10x, 30x or some other descriptive lable such as 40micron). Also, create a readme.txt file in a required extras subdirectory that describes the content of each subdirectory. For example:

/bil/lz/testuser/abcdef0123456789/mouse1/stitched
/bil/lz/testuser/abcdef0123456789/mouse1/aligned
/bil/lz/testuser/abcdef0123456789/mouse1/tracings
/bil/lz/testuser/abcdef0123456789/mouse1/extras/readme.txt

Feel free to add additional (non BIL required) metadata that you want to include with each dataset in the extras subdirectories.

Whole brain datasets containing only higher-level data

It is not unusual to make lower-level image data available before the higher-level analysis is complete. When you are ready to submit this data, you would need to create a new submission. In this case, use the same sample name as the prior BIL submission and a subdirectory to indicate the level of procesing (see recommended names above under Whole brain data containing a mix of lower-level and higher level data. For example, If SWC tracing files were provided for the sample named "mouse1" at a later date, you might create a directory like:

/bil/lz/testuser/fef9236f6400b832/mouse1/tracings

Spatial transcriptomic datasets

We recommend that spatial transcriptomic data contains both the raw and processed image data as well as the processed csvs. Please submit your image data using the following directory structure:

/bil/lz/testuser/abcdef0123456789/mouse1/raw/mouse1_sample1
/bil/lz/testuser/abcdef0123456789/mouse1/raw/mouse1_sample2
/bil/lz/testuser/abcdef0123456789/mouse1/raw/mouse2_sample1
/bil/lz/testuser/abcdef0123456789/mouse1/raw/mouse2_sample2
/bil/lz/testuser/abcdef0123456789/mouse1/processed/mouse1_sample1
/bil/lz/testuser/abcdef0123456789/mouse1/processed/mouse1_sample2
/bil/lz/testuser/abcdef0123456789/mouse1/processed/mouse2_sample1
/bil/lz/testuser/abcdef0123456789/mouse1/processed/mouse2_sample2

Place your experiment-wide processed data as csv files in a subdirectory called processed:

/bil/lz/testuser/abcdef0123456789/processed

If you have additional metadata that you want to include, place it in a subdirectory called extras along with any other notes that you want to include about the data:

/bil/lz/testuser/abcdef0123456789/extras