Digital Library FAQ
You can search the Digital Library using the general website search box in the upper right hand corner of the page. Search results will include both Digital Library content and all of the other website content. You can use the facets (checkboxes) on the left side of the search results page to refine your results and limit them to specific content types - like book or image - as well as limit to specific subjects, dates, or authors. If you're familiar with doing advanced searches in Google, those tricks will work on our site as well.
The ability to refine the search results using those checkboxes (facets) is new. We'd love to hear any and all opinions on how it's working for you - just Contact Us.
Please see our Copyright and Permissions page for more information about copyright terms used on the site, take-down requests, and obtaining very high resolution images for reuse.
The Smithsonian Libraries has been scanning books and making them available online since 1997 but it wasn't until 2007, when we began digitizing for the Biodiversity Heritage Library (BHL), that we transitioned to "mass" scanning. Since then, we have digitized over 22,000 items containing over 9 million pages for the BHL from our collections in zoology, botany, agriculture, paleozoology and paleobotany, along with some history, anthropology, horticulture, and geology titles.
In 2010 we began to systematically digitize titles in our Art, History, and Technology collections. These titles can be found in the Digital Library Books Online section. As of 2016, more than 7,000 items with nearly 2.5 million pages are available on our website, including rare titles in the history of science, African exploration, automotive and railway history, and art and design. Books are browseable by subject, author, and have been grouped into topic collections.
All of the books in the Digital Library are available for download as PDFs and epubs, as are their individual page images and plain text (uncorrected OCR.)
Follow the links on each book's page (under the book viewer) to download the book as a PDF, ePub, or zipped file containing all the page images in JPEG2000 format.
Downloading One Page from a Book
If you only need a small image from a book suitable for use online, you can right click (Command + click on your Mac) on the page you want, and choose "Save As" or "Open image in new window" - this will give you a low-resolution JPG of the page.
If you need a higher resolution version of the page image, you can download a full resolution JPEG2000 image by following these steps:
1) Click on the link for "Read Full Screen" or "Find in: Internet Archive" located below the book. You should be redirected to a URL like this: http://archive.org/details/butterflybookpop00smholl - find the page you want, and open the image in a new window. Look at it's URL and you should now know the page image's filename, usually something like butterflybookpop00smholl_0008.jpg
2) Go back to the original URL and replace /details/ with /download/ like so: http://archive.org/download/butterflybookpop00smholl
3) Now, copy the last part of that URL, which is the book's identifier. put a / at the end of the URL and then paste in the identifier, followed by _jp2.zip/ (n.b. the trailing slash is important) your URL will now look like http://archive.org/download/butterflybookpop00smholl/butterflybookpop00smholl_jp2.zip/
4) Hit enter, and you should now see a list of all the individual JPEG2000 images in the book. Download the filename for the image you want.
Note: not all image viewing or editing programs support JPEG2000 images. This matrix on Wikipedia details which applications can work with JP2s
Technical Specs, aka, FAQ for Library Colleagues. If your questions are not answered here, please send us an email through our Contact Us form.
What platform is the Digital Library running on?
What metadata standards do you use?
What are your image digitization standards?
What type of scanners do you use?
Do you digitize for preservation, and if so, what standards and processes are you following?
The books portion of the Digital Library is built in Drupal, as is all website content created after 2012. Digital Library book image files are stored on and served from the Internet Archive. Book metadata is harvested and stored in Drupal, and the page images are served up via a custom Drupal module using the Internet Archive page turner. Other parts of the Digital Library, particularly older virtual exhibitions, bibliographies, and special collections inventories, were all developed using ColdFusion over MSSQL databases.
Digitized book metadata follows the model set up by the Internet Archive. It includes descriptive records in MARCXML with item and page level data in separate XML files. The MARCXML is generated from our catalog MARC records at the time of scanning. In the Digital Library, we make the item level data available as RIS (for citations) and as Linked Open Data in RDFa. The page level data (enumeration, page type, etc.) is either created by the Internet Archive using their Biblio software, or created by us at time of scanning using our MACAW (Metadata Collection And Workflow) tool, which was developed in-house.
Many (but not all) images in the Galaxy of Images have embedded metadata following the Smithsonian standard for embedded metadata in images. Vocabularies used in descriptions include Dublin Core, LCSH, AAT, SKOS, bibo, and Romaine subject headings for our Trade Literature.
We follow the FADGI guidelines for creation of still images. Because many of our books are digitized by the Internet Archive, which does not follow those guidelines, we don't guarantee that every page image will meet all our internal standards, however all images in non-folio size books, excluding "foldouts", should be at least 300ppi.
Digitization done in-house creates 24-bit color TIFFs as master images and then uncompressed and compressed JPEG2000s as derivatives. Internet Archive uses lossy, compressed JPEG2000s as master images.
More information about our equipment can be found on the Digital Library department's page.
We typically do not digitize for preservation, only for access.