Tuesday, March 6, 2012

MCLS and SCS

Remember these two acronyms: MCLS and SCS.

MCLS Offices in Lansing
Midwest Collaborative for Library Services (MCLS), based in Lansing, Michigan, provides libraries in Michigan and Indiana with a range of services, from group licensing and training to convening and facilitation. And in this case -- innovation.

  Sustainable Collection Services (SCS)-- based in Contoocook, New Hampshire; Watertown, Massachusetts; Portland, Oregon; and with major operations in the cloud -- provides decision-support tools and services for deselection and shared print collection management. And in this case -- innovation.

Over the past six months, SCS and MCLS have collaborated on a unique pilot project: developing a shared print monographs program across seven Michigan academic libraries. Print book collections in these pilot libraries vary in size, from 160,000 to nearly 1.2 million titles. Participants are small, medium, and large state universities, including one ARL library. Some are confronting immediate space problems; some are not. But all seven libraries see long-term value in collaborative management of print book collections. In alphabetical order, these forward-looking libraries from the Wolverine State are:

Central Michigan University
Eastern Michigan University
Grand Valley State University
Michigan Technological University
Saginaw Valley State University
Wayne State University
Western Michigan University

You'll be hearing a lot about this project and these libraries in the coming months, and for good reason. MCLS and SCS have pioneered -- and implemented -- a practical shared management solution for low-use print monographs.

Working closely with SCS to compile and analyze their combined collections data, the MCLS pilot group identified 534,000 low-circulation 'title-holdings' to be considered for withdrawal from their collective shelves. (A title-holding is SCS terminology for a library-specific holding of a title held by multiple pilot libraries.) For these same titles, 2 title-holdings of each will be retained within the group. The 534,000 allocable withdrawal candidates were identified based on these criteria:
  • 3 or fewer circulations since 1999
  • Held by 3 or more pilot libraries
  • Published or added before 2005

Allocation of withdrawal candidates (and corresponding assignment of retention commitments) among seven libraries proved a complex process. The desire to withdraw title-holdings with the fewest circulations had to be balanced against the withdrawal targets of those libraries needing space immediately. The relative size of collections also had to be factored in. Equity had to be defined and assured. After 15 iterations (!), SCS established an allocation algorithm that largely satisfied these objectives. (We'll be protecting *that* like the original formula for Coke!)

SCS is now producing lists of withdrawal candidates and retention commitments for each library. The lists will be completed by mid-March, and pilot libraries will be free to act on both fronts, in tandem with the Memorandum of Understanding now under construction. In its simplest terms, this project represents a clear example of the power of collaboration on shared print book collections. Working together provided both more collection security and more opportunity for withdrawals than any library acting alone.

This project was initially suggested by Doug Way, Head of Collections at Grand Valley State University. His idea surfaced after a standalone SCS  deselection project at GVSU, when it became obvious that cooperative action within Michigan would lead to a more comprehensive and stable regional solution. Randy Dykhuis, Executive Director of MCLS, agreed to solicit participants, mediate business arrangements and provide coordination, communication, and facilitation. SCS agreed to undertake development of the requisite tools, data,  and services, and in July 2011 began loading data from all seven libraries.

We have learned a great deal in the intervening months. MCLS, the pilot libraries, and SCS all plan to speak and write about the process and results. [Contact us if you're interested in a presentation.] While we faced several operational and organizational challenges. we have mostly wrestled those to the ground. With minor reservations and many ideas for improvement, all parties consider this phase of the project a success. We are eager to tell the story, and help other groups move forward. For now, a few highlights:
  • 3.8 million bib records, plus circulation and item data, were extracted from two different ILS systems, loaded to SCS servers, normalized, and compared to WorldCat & HathiTrust.
  • Comparable circulation data existed for an 11-year period: 1999-2011. 1.74 million title-holdings (46%) did not circulate during that time. Normalization of circulation data is a major challenge.
  • 1.36 million unique titles (36%) were held by the group. Even the smallest library held more than 40,000 unique titles.
  • 989,000 titles (26%) were held by 4 or more pilot participants.
  • 2.36 million (62%) title-holdings showed more than 100 WorldCat holdings. 2.93 million (77%) showed more than 50 WorldCat holdings.
  • 1.57 million (41%) title-holdings were HathiTrust in-copyright titles. 131,00 (3%) were HathiTrust public domain titles.
    This data, combined and recombined in various ways, allowed MCLS pilot libraries to gauge the impact of different deselection scenarios. The group has proceeded cautiously but steadily, breaking a new trail for themselves and others who decide to follow. Group decisions have occurred relatively quickly, in part because the choices rest on clear data. (This is especially encouraging, since this was an ad hoc group, formed specifically for this project.)

    We experienced (and at times created) data errors, but each in turn has been corrected, clarified, or restated as necessary. There are still policy issues to be worked through, especially around retention commitments. But together, MCLS, the pilot libraries, and SCS have made solid progress in creating an infrastructure for the shared management of print monographs.

    The whole project has been and continues to be a remarkably pleasant experience. So, from the Granite State, the Bay State, and the Beaver State (components of the virtual SCS organization) to our collaborators in the Wolverine State, we say thank you for living up to Michigan's motto: "Si Quaeris Peninsulam Amoenam Circumspice" ;-) Or to paraphrase the translation of another Latin saying on behalf of the MCLS-SCS enterprise: 'We came. We crunched. We concurred.' We look forward to the next steps.

    Thursday, March 1, 2012

    Data with Benefits

    Initial batch data extract from library
    Sustainable Collection Services (SCS), the company I run with three business partners, provides decision-support for print monographs deselection. SCS processes are built on data and batch processing. We first import the library's bibliographic, item, and circulation data. That data is then normalized, cleansed, and compared to other data sets such as HathiTrust, WorldCat, peer libraries, and authoritative lists. Library-defined rules then operate against the resulting superset of data, enabling selectors or administrators to gauge the effect of different deselection criteria. Ultimately, candidate lists for withdrawal and preservation are produced.

    That is the service we planned to build. It is the service we actually have built and applied to numerous library projects over the past year. But it turns out to be only part of our business. Working with large monographs data sets also creates opportunities for validation, remediation, analysis, and batch processing. Initially, we regarded these as side benefits. Now we are beginning to think of them as integral to the overall SCS service. Consider some simple examples:

    • Missing or Invalid OCLC Control Numbers: It is fairly common for some portion of a library's cataloging records to lack OCLC numbers. In other cases, those numbers may be truncated or malformed. For records without valid OCLC numbers, SCS uses a combination of LCCN and string-similarity matching to identify likely record matches and corresponding control numbers. These can be returned to the library in a batch to enable update of its catalog. 
    • OCLC Holdings Not Set: As SCS queries the WorldCat API to look up summary and peer holdings, it becomes apparent that in some cases the library's own holding has not been set. We can report these instances to the library, and produce a list that enables batch holdings update--sort of a miniature reclamation project.
    • Profile of a Group Collection: In one recent project with a pilot group of seven libraries, SCS identified uniquely-held titles for each participant, as well as the degree of overlap on all others. Combined with corresponding circulation data, this enabled identification of a sweet spot for shared print commitments. There are many possibilities in this area.
    • Print/E-Book Overlap: Provided SCS has a library's records for both print and electronic books, it is increasingly possible to determine whether a low-circulation print title is also held as an e-book. There are many caveats here (e.g., it may be important to distinguish whether the e-book is owned, rather than simply available as part of a package or a patron-driven acquisition record). But this overlap is of interest to many libraries.
    • FRBR-on/FRBR-off: Edition matching is a critical element of deselection and collection analysis. For archiving and preservation purposes, exact matches are imperative. For user purposes, exact matches are sometimes important and sometimes not. SCS holdings lookups start with FRBR groupings off, a conservative approach that assures edition-specific matches. For titles that return few holdings, we then re-run the lookups with FRBR groupings on, returning these "softer" matches to the library for review.
    • Batch Processing Support: Deselection projects create record maintenance work, regardless of whether titles will be transferred or withdrawn. Some record maintenance steps (e.g., suppression, location changes) can be completed as batch processes, based on lists that include local control numbers and necessary data elements. Often, SCS can produce labor-saving lists of this sort from the data we hold. 
    Remediated batch of data enroute to library...
    In each project, we encounter additional opportunities to derive new value from the data. In shared print projects, for instance, it will increasingly prove useful to highlight retention commitments as well as withdrawal opportunities. These are most efficiently handled as batch processes. As always, we are limited only by the data itself and our own creativity. We will continue to look for more ways to benefit from the effort that goes into deselection projects. Some solutions may be partial in scope, but in a large data set even partial solutions can save many hours of staff time.