Describing Your Project

To enable discovery, reuse, and citation all research data should have both a corresponding Data Dictionary that defines and describes the data itself as well as descriptive Project Metadata (also known as citation metadata, a data record, a metadata record, or a dataset record). The information supplied in the project metadata should be sufficient to enable you and others to find, access, and properly cite your data and can include important details about your research.

Data Dictionaries

Providing detailed descriptions of your data in a data dictionary can enable correct interpretation, re-use, and better management of your data in the future. ​

A data dictionary, "read me" file, or key explains the contents of the dataset. Any information that someone would need to interpret or re-use your data should be included in the data dictionary. It may include full definitions of any abbreviations used, units of measurement, allowable values in a field, data types, thesauri or controlled vocabularies used, and other important details of the data elements along with a brief description of the provenance or parameters of the data, i.e., date or location the data was collected.

For an example of a data dictionary and more detailed guidelines, download Smithsonian Data Management Best Practices: Describing Your Data: Data Dictionaries (pdf).

Resources for creating data dictionaries:

Project Metadata

Project Metadata allows you to describe your project. Also known as citation metadata, a data record, a metadata record, or a dataset record, project metadata is the who, what, where, when, and how about your data. ​Including descriptive information (metadata) when depositing and/or sharing your research data enables discoverability and makes it easier for others to cite and reuse your work. The information supplied in the project metadata should be sufficient to enable you and others to find and properly cite your data, and can include important details about your research.

You wouldn't let your dog outside without a collar on: don't let your data outside without metadata!three black and white pictures of dogs with captions comparing their tags to descriptive metadata

ALWAYS include:

  • Creator/owner(s): including complete names, institutional affiliations (including SI unit) and any ORCIDs.
  • Title: a meaningful and descriptive title,prefaced with the word Dataset
  • Publication Date: Date the data is made public or if data is restricted and not publicly available, the date it was deposited.
  • Persistent or globally unique identifier: a DOI (Digital Object Identifier) is the preferred Globally Unique Identifier, but a URN, Handle, EzID or ARK are acceptable. If no persistent identifier is available, a URL/URN for the data is mandatory. If you are publishing your data in Smithsonian's Figshare for Institutions (SI staff), a DOI is automatically assigned and can be reserved prior to publication.

Include when possible:

  • Resource type: the general format of the datae.g., tabular data, audio files, sensor data, images, etc.
  • Publisher: usually this will be the hosting location or organization with which you have deposited your data. Use the organization name, a URL, or URN for the repository.
  • Grant: either the name, e.g., "CLIR Hidden Collections 2017" or the grant number associated with the dataset.
  • Description: Abstract for the dataset (not the paper!) that covers who, what, where, when, why in narrative format. This is akin to an abstract for the 'Methods' section of a paper.
  • Preferred citation format: e.g. MLA, APA, etc.
  • Related publications: a published article, code, or related datasets, referenced with a resolvable URL or a DOI.
  • Rights, restrictions, and/or licenses: any that should be applied to the data.
  • Version: a version number if applicable.

For an example of a project metadata file and more detailed guidelines, download Smithsonian Data Management Best Practices: Describing Your Project: Citation Metadata (pdf).

Resources for data citation

Want to learn more? The USGS persuasively explains why we need metadata and uses for metadata .

Last Updated July 29, 2021