What do we mean when we say Digital Library?
What do we mean when we say Digital Library?
Digital Library is the Smithsonian Libraries' way of referring to the online resources we've created for use by the public. These resources often use digitized collections items - digitized books, photos, videos, etc.
The Digital Library contains over 5,000 scanned books along with photo collections, videos, scholarly bibliographies, virtual exhibitions, and searchable databases of resources related to art, history, science, technology and library and museum collections or exhibitions.
Searching the Digital Library
For now, the only way to search is by using the general website search box in the upper right hand corner of the page. This searches all Digital Library content and all of the other website content. This search uses a Google Search Appliance, so if you're familiar with doing advanced searches in Google, those tricks will work on our site as well. We are working on creating a search specifically for our online books.
Image and Content Reuse
The Smithsonian Libraries has been scanning books and making them available online since 1997 but it wasn't until 2007, when we joined the newly formed Biodiversity Heritage Library (BHL), that we began digitizing at a high volume. Since then, we have digitized over 23,000 items containing over 9 million pages for the BHL from our collections in zoology, botany, agriculture, paleozoology and paleobotany, along with some history, anthropology, horticulture, and geology titles.
In 2010 we began a project to similarly digitize titles in our Art, History and Technology collections. This project was named the Cultural Heritage Library (CHL) as a counterpoint to the BHL. These titles can be found in our Digital Library. As of 2014, more than 6,500 items with nearly 2.5 million pages are available, including rare titles in the history of science, African exploration, automotive and railway history, and art and design.
All of our titles are also freely available on the Internet Archive, which is searchable in Open Library.
More information about book digitization can be found in our Books FAQ.
Instructions for downloading entire books, or single images from books, can be found on the Books FAQ
Technical Specs, aka, FAQ for Library Colleagues. If your questions are not answered here, please send us an email through our Contact Us form.
What platform is the Digital Library running on?
What metadata standards do you use?
What are your image digitization standards?
What type of scanners do you use?
Do you digitize for preservation, and if so, what standards and processes are you following?
The books portion of the Digital Library is built in Drupal, as is all website content created after 2012. Digital Library book image files are stored on and served from the Internet Archive. Book metadata is harvested and stored in Drupal, and the page images are served up via a custom Drupal module using the Internet Archive page turner. Other parts of the Digital Library, particularly older virtual exhibitions, bibliographies, and special collections inventories, were all developed using ColdFusion over MSSQL databases.
Digitized book metadata follows the model set up by the Internet Archive. It includes descriptive records in MARCXML with item and page level data in separate XML files. The MARCXML is generated from our catalog MARC records at the time of scanning. In the Digital Library, we make the item level data available as RIS (for citations) and as Linked Open Data in RDFa. The page level data (enumeration, page type, etc.) is either created by the Internet Archive using their Biblio software, or created by us at time of scanning using our MACAW (Metadata Collection And Workflow) tool, which was developed in-house.
Many (but not all) images in the Galaxy of Images have embedded metadata following the Smithsonian standard for embedded metadata in images. Vocabularies used in descriptions include Dublin Core, LCSH, AAT, SKOS, bibo, and Romaine subject headings for our Trade Literature.
We follow the FADGI standards for creation of still images. Because many of our books are digitized by the Internet Archive, which does not follow those standards, we don't guarantee that every page image will meet all the FADGI standards, however all images in non-folio size books, excluding "foldouts", should be at least 300ppi.
Digitization done in-house creates TIFFs as master images and then uncompressed and compressed JPEG2000s as derivatives. Internet Archive uses lossy, compressed JPEG2000s as master images.
More information about our equipment can be found on the Digital Library department's page.
We typically do not digitize for preservation, only for access.