Best Practices for Filenaming, Organizing, and Working with Data
Tabular data, file names, and file organition should be done in a way that makes it easy for both humans and machines to access, analyze, and use data stored in those files. Below are some basic principles for working with data files.
File Naming and Organization
The five precepts of file naming and organization:
- Have a distinctive, human-readable name that gives an indication of the content.
- Follow a consistent pattern that is machine-friendly, i.e.:
- is not over 25 characters.
- does not use spaces or unusual characters.
- Organize files into directories (when necessary) that follow a consistent pattern.
- Avoid repetition of semantic elements among file and directory names. eg don't have both:
--2018 forest plot measurements/forest_plot1/dataset3.txt
and
--2018 forest plot measurements/forest_plot2/dataset3.txt - Have a file extension that matches the file format (no changing extensions!).
Resources
- Smithsonian Institution Archives Setting up Electronic Files and Overall File Management.
- NIST Electronic File Organization Tips (pdf).
- Purdue Library Data Management for Undergraduate Researchers: File Naming Conventions.
- Stanford Libraries Best practices for file naming.
- University of Edinburgh Records Management: Naming Conventions.
Working with tabular data (spreadsheets)
Putting data into simple tables is one of the most common ways to store and then work with data. Below are some basic principles for organizing data into tables so that both humans and machines can use that data.
For more detailed guidelines and examples, see the resources linked below and the Smithsonian Data Management Best Practices for Naming and Organizing Files and Working with Tabular Data PDFs.
- Do not rely on special formatting such as cell colors, text bolding, or other visual clues to provide meaning.
- Do not include figures, analyses, or charts along with simple tabular data (such as in Excel). Store them separately as figures in a standard format such as PDF or TIFF.
- Reserve the first row in a table for column headers, aka field names. The first row should never be empty.
- Each row in your file should represent a single record or data point, e.g., the measurements of one sample or the response of one individual.
- Standardize the format of content within a column, e.g., all numbers or all text.
- Final copies of tabular data should be stored as a copy without macros or formulas, in a non-proprietary format such as comma or tab separated values (.csv or .txt)
Resources
- Borer et al.: Some Simple Guidelines for Effective Data Management (pdf).
- Hook et al.: Best Practices for Preparing Environmental Data Sets to Share and Archive.
- Cornell University Research Data Management Service Group: Preparing tabular data for description and archiving.
Last Updated October 6, 2023