Whether you choose to deposit your data in a specialty repository, a general-purpose external repository, or a local SI repository, make sure that the services and terms offered fit the needs of your data.
Given the large number of specialty repositories that exist or are being built for specific data types, specific organisms, and large grant-funded collaborative projects, it is impractical to list all the data repositories that could be used by SI researchers to conform to the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Before depositing data in a repository not listed below or on the attached best practices document, you should at a minimum insure that the repository:
- Has a plan and sufficient funding to ensure its long-term viability
- Allows export of data and data descriptions in a standards-compliant format, preferably identical to the format you deposited.
Ideally, the repository should also:
- Enable easy citation of your data, including supporting DOIs (either minted by the repository or by SIL).
- Be searchable, and indexed in a service such as DataCite or Elsevier's DataSearch.
- Support application of an appropriate license, and embargo of data if necessary.
- Support metadata standards for your data, e.g., ISO 19115 for Geographic data.
RE3DATA.org - a registry of research data repositories - is an excellent source of detailed information on individual repositories. You may also want to consult the PLoS One list of recommended repositories (listed by data format and discipline).
Any repository managed by a U.S. Federal agency or national laboratory, e.g., NIH's GenBank, NASA's National Space Science Data Center, or ORNL's DAAC is considered a preferred repository for any SI research data that meets their criteria for deposit. In addition, data repositories run by established U.S. institutions such as Harvard's Dataverse, are also acceptable.
SI has three centrally-managed repositories that accept Smithsonian-produced data: Figshare for Institutions, SRO, and SIdora. All support myriad filetypes and are discipline agnostic. All accept (or mint) DOIs for citation. Figshare is a commercial platform designed for sharing research data and is backed up in the cloud, though it is centrally administered by SLA and OCIO. SRO and SIdora have actively managed, backed-up, secure storage in Smithsonian's Herndon Data Center. All platforms support having both open (accessible) and closed (private data) though Figshare is primarily designed for open data.
- Figshare for Institutions – is best for sharing data that need a DOI including those that underlie peer-reviewed publications; bounded datasets of mixed formats; or data that is periodically updated and needs to be versioned. See the Figshare Confluence site for more information.
- SRO – is best for smaller (<50GB), fixed (inactive) datasets that accompany or support publications deposited in SRO.
To deposit data and publications in SRO, you can self-deposit using the forms found on the internal staff pages or contact email@example.com
- SIdora – is best for larger, or more complicated datasets, including actively updated datasets
To deposit data in SIdora contact Beth Stern or email firstname.lastname@example.org
If you or your publisher prefer to deposit with a non-SI repository, there are four general-purpose repositories that support FAIR principles. Their features are compared below. Following the grid is a glossary that clarifies the terms we've used as comparison criteria.
|Repository||Dryad||Figshare for Institutions||Open Science Framework (OSF)||Zenodo|
|More information||More @ Re3 ⇗||More @SI||More @ Re3 ⇗||More @ Re3 ⇗|
|Fees (2018)||$120 per deposit (SI is not a member, and cannot get a discount.)||free to SI staff||free||free|
|office documents, scientific & statistical data, plain text, structured text, software, source code, other||office documents, images, structured graphics, audiovisual data, raw data, plain text, archived data||any (no restrictions on file types)||any (no restrictions on file types)|
|Persistent identifiers||will assign a DOI||Integrates with ORCID. Can reserve (pre-publication) or assign (at time of publishing) SI-specific DOI||supports ORCID, will assign ARK and DOI||supports ORCID, will assign DOI or use provided DOI|
embargoed (only for certain publishers)
|open; embargoed; restricted; confidential||open; restricted; closed||open; embargoed; closed; restricted|
|CC 0||CC0, CC-By, CC-By-NC, MIT, GNU, PGLv3, Apache 2.0||CC (all), Apache, MIT, GNU, other||CC (all), other|
|yes||yes||yes||yes - updated files are considered new versions and receive new DOIs|
Fees (2018): Fees for depositing or maintaining access to the data. Verify the fees when depositing your work.
Formats accepted: Data formats accepted for deposit by the repository. Common file formats will usually be abbreviated by the file extension i.e. .xlsx, .csv. These may include proprietary or uncommon file formats from software specific to one discipline.
Persistent identifiers: Persistent identifiers are registered unique strings (numbers or alphanumeric) that allow your deposit to be referenced easily. Notes in this field indicate if the repository provides the service of assigning identifiers, or gives you a place to store a persistent identifier you have created.
- DOI: Digital Object Identifier is a persistent unique identifier assigned by a registration agency to a digital object. Because DOIs are registered, if the content changes location, the DOI will still be "resolvable", that is, it will still link to the content.
- ARK: Archival Resource Key is a persistent URL assigned by one of several registered naming authorities, following the ARK schema.
Access options :The repository may allow you to control who and when data can be found, viewed, etc. Embargo periods may be an option to allow for data to be hidden for a specific amount of time.
Licenses available:Types of licenses that the repository enables you to apply to your data. Licenses specify the permitted use and/or reuse of data. They do not control who can view or access your data (see below under "Access options.")
For more information about choosing licences for your data the Digital Curation Centre has an excellent guide: http://www.dcc.ac.uk/resources/how-guides/license-research-data . Generally, datasets created by Federal employees in the course of their duties are considered to be in the Public Domain (with some exceptions). The most common types of licenses available are Creative Commons (CC) licenses, but many repositories offer software licenses (MIT, GNU) that may be more appropriate for code associated with datasets.
- CC-O License: Creative Commons Zero https://creativecommons.org/choose/zero/ Waive all copyright and related or neighboring rights that you have over the work
- CC-BY-NC License: Creative Commons Attribution - NonCommercial
- CC-BY-ND License: Creative Commons Atribution -NoDerivatives
Versioning : Does the repository offer automatic versioning of data when deposited? Versioning can be important for datasets that are periodically updated. Some repositories may provide automatic versioning or version control that help clarify which datasets were used to produce which outputs (publications, etc.).
Usage Statistics: Repositories may provide statistics on the use of the materials in the respository. This may include information on individual datasets, such as downloads and views.