Describing Your Project : Citation Metadata

In addition to a data dictionary to describe the information in a dataset, research data should be accompanied by citation metadata, aka project metadata, a data record, a metadata record, or a dataset record. The information supplied in the project description should be sufficient to enable you and others to find and properly cite your data, and can include important details about your research.

About Citation Metadata

A metadata record gives the basic who, what, where, and when of the data. It is a high level description that others can use to cite your data. It may be submitted with a dataset as a separate file when deposited in a repository, or displayed in the repository with data entered into a form.
You wouldn't let your dog outside without a collar on: don't let your data outside without metadata!

ALWAYS include:

  • Creator/owner(s): including complete names, institutional affiliations (including SI unit) and any ORCIDs.
  • Title: a meaningful and descriptive title, prefaced with the word Dataset:
  • Publication Date: year (and if relevant month and day) the data is made public, or if data is restricted and not publicly available, the date it was deposited.
  • Persistent or globally unique identifier: a DOI is the preferred unique identifier, but a URN, Handle, EzID or ARK are acceptable. If no persistent identifier is available, a URL/URN for the data is mandatory. If you are publishing your data in Smithsonian's Figshare for Institutions a DOI is automatically assigned and can be reserved prior to publication.

Include when possible:

  • Resource type: the general format of the data e.g., tabular data, audio files, sensor data, images, etc.
  • Publisher: usually this will be the hosting location or organization with which you have deposited your data. Use the organization name, a URL, or URN for the repository.
  • Grant: either the name, e.g., "CLIR Hidden Collections 2017" or the grant number associated with the dataset.
  • Description: Abstract for the dataset (not the paper!) that covers who, what, where, when, why in narrative format. This is akin to an abstract for the 'Methods' section of a paper.
  • Preferred citation format: e.g. MLA, APA, etc.
  • Related publications: a published article, code, or related datasets, referenced with a resolvable URL or a DOI.
  • Rights, restrictions, and/or licenses: any that should be applied to the data.
  • Version: a version number if applicable.

1: When your data citation is entered into Smithsonian Research Online (SRO), this prefix helps to distinguish it from any related publications, and enables the Institution to report on our compliance with our Public Access Plan.
Don't see your dataset linked in SRO?  Contribute the citation for your data! (SI staff)

Resources for data citation

Last Updated October 6, 2023