Macaw

In the summer of 2010, the Smithsonian Institution Libraries, with a grant from the Atherton Seidell Endowment Fund, developed a process to scan folio volumes, large fold-outs, and other materials not suitable to our existing digitization workflow. As part of this process, the Macaw tool was developed to collect page-level metadata and manage the scanned pages. The result is a complete digital version of the item ready to be shared with external systems, such as the Biodiversity Heritage Library and the Internet Archive.

Summary

Macaw performs three major tasks in the scanning process:

  • Import and management of the images from the scanner or camera.
  • Collection of the page-level metadata that describes the physical aspects of the page.
  • Post-processing and exporting/uploading the digital book to other systems.

Customized Integration

Existing as only one step in a larger process, Macaw is built in a modular manner to be customized to a system’s unique needs through two sets of required PHP objects. The first is meant to ingest metadata about new items from an external system. The second is one or more export modules used to share the completed item with other systems. Macaw also has a number of configuration settings that are used to integrate it into an existing server setup.

Metadata Collection

Based on the metadata received during the initial ingest process, Macaw instructs the user on where to place the scanned images, usually in a shared directory. Scanned images are ingested, lightly processed into manageable preview images and thumbnails, and made ready for review by the user. The user accesses Macaw through a web browser to view and add the metadata for the item. Macaw’s interface is built using Javascript and communicates to the server via AJAX.

Administration

Macaw provides separate user accounts, a few administrative tools to assist in management, extensive logging for both analysis and forensics, and the ability to have a “quality assurance” user to review the work of others before approving an item for sharing to other systems.

Technology

Macaw is built using PHP (v5.3) and the CodeIgniter (v1.7) framework. It uses the PostgreSQL (v8.4) database server, but can easily be modified for MySQL and even SQLite. The Javascript/AJAX interface is built using the Yahoo! User Interface version 2. Currently Macaw is best suited to the Firefox or Chrome web browsers. Macaw was built on Apache (v2.2) and Linux or OS X, but can likely be run on Windows and other web servers with minimal changes.

If you are interested in running a local instance of Macaw, or would like to contribute to code development, code is available on GitHub.