Best Practices for Filenaming, Organizing, and Working with Data

Tabular data, file names, and file organition should be done in a way that makes it easy for both humans and machines to access, analyze, and use data stored in those files. Below are some basic principles for working with data files.

File Naming and Organization

The five precepts of file naming and organization:

  • Have a distinctive, human-readable name that gives an indication of the content.
  • Follow a consistent pattern that is machine-friendly, i.e.:
    •  is not over 25 characters.
    • does not use spaces or unusual characters.
  • Organize files into directories (when necessary) that follow a consistent pattern.
  • Avoid repetition of semantic elements among file and directory names. eg don't have both:
    --2018 forest plot measurements/forest_plot1/dataset3.txt
         and
    --2018 forest plot measurements/forest_plot2/dataset3.txt
  • Have a file extension that matches the file format (no changing extensions!).

Resources

Working with tabular data (spreadsheets)

Putting data into simple tables is one of the most common ways to store and then work with data. Below are some basic principles for organizing data into tables so that both humans and machines can use that data. 

For more detailed guidelines and examples, see the resources linked below and the Smithsonian Data Management Best Practices for Naming and Organizing Files and Working with Tabular Data PDFs.

  • Do not rely on special formatting such as cell colors, text bolding, or other visual clues to provide meaning.
  • Do not include figures, analyses, or charts along with simple tabular data (such as in Excel). Store them separately as figures in a standard format such as PDF or TIFF.
  • Reserve the first row in a table for column headers, aka field names. The first row should never be empty.
  • Each row in your file should represent a single record or data point, e.g., the measurements of one sample or the response of one individual.
  • Standardize the format of content within a column, e.g., all numbers or all text.
  • Final copies of tabular data should be stored as a copy without macros or formulas, in a non-proprietary format such as comma or tab separated values (.csv or .txt)

Resources

Last Updated October 6, 2023