Thursday, June 2, 2011

The Hathi Effect

The HathiTrust database is of fundamental and growing importance to any deselection project. To some degree, this is true for any library, though, as always, membership has its privileges. As of today, 4,788,131 book titles reside in Hathi in secure, TRAC-certified digital form. The academic community as a whole can rest assured that these titles will not disappear from the cultural record. If Hathi offered nothing more than preservation and security of this sort, its value would still be elephantine.

 But in fact, the Trust provides a great deal more. As described in Heather Christenson's excellent article [PDF] in the April 2011 issue of Library Resources & Technical Services, the repository "is providing full-text search across more than 2.8 billion pages." In itself, this is a major enhancement to discoverability, but Hathi also provides direct access to full-text content. This occurs to different extents for different classes of users and depends on the copyright status of each title. And while those who contribute most naturally benefit the most, virtually any library can obtain some degree of access to this 'research library at Web scale.'

I've taken the liberty of re-arranging one paragraph from the Christenson article, and adding a comment or two, to highlight the potential for using Hathi as one element in a surrogate collection strategy.

Public Domain Titles: View Online
  • All public domain titles can be viewed on the web in a page-turner application."
Public Domain Titles: Downloads
  • "Google-digitized public domain volumes are available in a full PDF download to authenticated users from partner institutions"
  • "public domain volumes digitized via Internet Archive and locally by partners are available in full PDF to all."
Public Domain Titles: Printed Versions
  •  "Printed versions of public domain books from some partners are now offered via a link within the HathiTrust Interface to print-on-demand service.
In-Copyright Titles: 
  • "Volumes that are in copyright are discoverable via large-scale search, and users may view a list of pages on which their search term appears."
  • Users can also 'find in a library' via an embedded link to WorldCat. [added comment].
The ready availability of digital versions of these titles greatly reduces any risk associated with removing physical copies from library shelves, especially when those physical copies have never circulated. Among the first five libraries with which SCS has worked, Hathi public domain matches range from 3%-5% - typically not enough to generate significant space savings, but an uncontroversial place to start.

This first step is especially attractive given the convenience of adding Hathi URLs to existing MARC records. Yesterday's announcement that the California Digital Library has opened its HathiTrust SFX Target to the broader SFX community adds yet another convenient access path.

1 comment:

  1. Thanks for the PDF, Good idea to post your whole submission, thanks once again!@bose