Best Practices for Filenaming, Organizing, and Working with Data

Creating tabular data, naming files, and organizing files should be done in a way that makes it easy for both humans and machines to analyze and use data stored in those files. Below are some basic principles for working with data files.

At the bottom of this page are PDFs with more specific guidelines, examples, and further resources.

File Naming and Organization

The five precepts of file naming and organization:

  • Have a distinctive, human-readable name that gives an indication of the content.
  • Follow a consistent pattern that is machine-friendly, i.e., is not over 25 characters, and does not use spaces or unusual characters.
  • Organize files into directories (when necessary) that follow a consistent pattern.
  • Avoid repetition of semantic elements among file and directory names.
  • Have a file extension that matches the file format (no changing extensions!)

Resources: 

Working with tabular data (spreadsheets)

Putting data into simple tables is one of the most common ways to store and then work with data. Below are some basic principles for organizing data into tables so that both humans and machines can use that data. 

For more detailed guidelines, and examples, see the resources linked below and the PDF at the bottom of this page.

  • Do not rely on special formatting such as cell colors, text bolding, or other visual clues to provide meaning.
  • Do not include figures, analyses, or charts along with simple tabular data (such as in Excel). Store them separately as figures in a standard format such as PDF or TIFF.
  • Reserve the first row in a table for column headers, aka field names. The first row should never be empty.
  • Each row in your file should represent a single record or data point, e.g., the measurements of one sample or the response of one individual.
  • Standardize the format of content within a column, e.g., all numbers or all text.
  • Final copies of tabular data should be stored as a copy without macros or formulas, in a non-proprietary format such as comma or tab separated values (.csv or .txt)

Resources