Tuesday, December 11, 2012

I Pity the Poor Immigrant

The Great Divide ca. 2012
Bless me, natives, for I have sinned. It has been 159 days since my last blog post. I confess that I have allowed the 'real world' to impinge. This marks me as a digital immigrant, one who views his online presence as something separate; a hapless, fragmented soul who has not managed to integrate his tangible and virtual selves.

I hear that digital natives experience no such cleavage in their psyches. Online activity is undifferentiated from life; posting is merely consciousness made visible. I envy that a bit. For a digital green-card-carrier, maintaining an online presence is more like speaking in a foreign language. It requires extra effort, and fluency can be elusive. Worse, as a digital immigrant of a certain age (is there another kind?), I am hardwired to give precedence to tangible realities and relationships. I cannot seem to listen and tweet simultaneously. I am a late Boomer.

Montreal: Le Trafic Horrible
In recent months, the real-world demands of building a new business have crowded out blogging & tweeting. A surprising number of librarians remain unaware of one's cyber-existence. Finding customers for a new service still requires showing up in person: to meet, to describe, to see, to listen, to articulate, to elaborate, to adjust, to convince--to make the case face-to-face. And showing up has meant grinding out actual miles since July: Anaheim, Amherst, Pittsburgh, Indianapolis, Waterloo, Pomona, Brockport, Northridge, Worcester, Buffalo, Montreal, DC, Sturbridge, Des Moines, Schenectady, Charleston, West Lafayette, Corvallis...and soon enough to Seattle, Columbia, Utrecht and beyond.

Land of the Needle Jets
This is just how it works in the physical world: needle jets, thruways, traffic jams, library after library, spreading the word: "Rethinking Library Resources" and "Data-Driven Deselection for Monographs" and "Shared Print Management." Road food, flight delays, and time zone changes, punctuated by conference calls at highway rest stops, proposal writing, development meetings, negotiations, data discussions and project planning. All of this involves its own pleasures and pains, but seems to exist in a parallel universe, lacking convenient links or for that matter much convenience at all.

Meanwhile, ironically enough, my partners and I at SCS are building a virtual operation to serve a tangible need: cloud-based analytics to help manage print book collections. Amazon Web Services, Google Apps, postgres databases, solr indexes, FTP, DropBox, WebEx, Skype, and Google Hangout, shared screens and a host of other invisible tools are wielded by actual, hard-working humans with a good idea. Like all immigrants, we live and work in two worlds.

Showing up in Sturbridge, MA

So now it's back to work here, in this sphere of blogs and twitterage. Once again we turn our efforts to mastering that second language. We may never speak it like natives, but we will make ourselves heard and understood!

  • Sample & Hold: this blog will continue with comment on profession-wide issues related to deselection and shared print.
  • SCSInsight: our company blog will highlight features, developments, ideas about service, and interesting case studies in conjunction with our library partners.
  • @ricklugg will return to tweeting and re-tweeting on professional topics, with the occasional random comment.
  • @SCSInsight will continue to tweet on projects, trends, benchmarking, and other findings drawn from active library projects.
In the end, we're just singing the immigrant song (minus the bit about 'we are your overlords', of course!). We live in the tangible world. We live in the virtual world. We have to work in both and at both, as we carry on trying to "solve one problem well."


Wednesday, July 4, 2012

Not Dark Yet

Recently, in the course of writing an article on 'Data-driven deselection' for Insights: the UKSG journal, I grew curious about the UK perspective on the future of print monographs. In particular, I wondered if the UK Research Reserve, described, in a quadruple-modifier extavaganza, as a "collaborative distributed national research collection" includes monographs as well as journals. For now, it does not. The UKRR remains focused on print journals, and an 'ambitious target of releasing 100km of shelf space by the end of 2013.'

But the group has indeed considered the question of books, most recently in a June 2011 report entitled Less is more: Managing monograph collections in the 21st century. The report is based on a 'Strategic Management of Monographs Forum' held in London on March 17, 2011. The purpose: "to determine whether there was interest in the library sector for a scheme aimed at de-duplicating monograph collections." More than fifty representatives from higher education, national library organizations, and the British Library considered the history of collaborative collection management efforts (e.g., the Atkinson Report [abstract], which in 1976 recommended 'self-renewing' libraries, with low-use material being discarded to make room for new material).
Forum participants working up 'punchy responses'

Attendees also weighed the potential for shared collections, as suggested in recent initiatives such as the White Rose Collaborative Collection Partnership among the Universities of Leeds, Sheffield and York and the British Library. (This project will be the subject of a more detailed post here soon.) But the main point of the day was to solicit views from many perspectives on deduplicating print monographs. A number of interesting comments surfaced as the group confronted four broad questions:
  1. Is there a need for cooperative strategic management of monographs?
  2. What are the risks and challenges to a collaborative model?
  3. What are the barriers to success?
  4. How might cooperative management be put into practice?
Primary source material, digitized
The document is well worth reading in its entirety, but the responses that particularly caught my eye included the following: (I have consolidated and paraphrased slightly, but have *not* tampered with their promised "punchiness.")
  • The ecology of monographs is complex and fundamentally different than journals.
  • Rationalisation of monographs would greatly affect the humanities research process.
  • There is no strong demand for a large-scale national initiative. Better to build on existing regional initiatives, such as the White Rose collection management project.
  • Libraries cannot afford, either fiscally or reputationally, merely to store collections. Active curation and disclosure are necessary.
  • There is constant pressure on libraries across all sectors to reduce their estate footprint.
  • In these difficult financial times libraries have a responsibility to only house collections which are of value to their institutions.
  • There is a need to articulate a clear vision supported by all stakeholders. Any initiative should encompass the wider issues of strategic collection management and not be limited to deduplication activity. The drivers [to shared print management] need to be broader than financial.
  • There is a risk of missing the window of opportunity for collaboration (as institutions begin to deduplicate independently instead.
At the end of the day, Less is more summarized the sense of the UK academic library community in this manner [emphasis added]:
"The steer from the delegates was though it would be useful to address these issues collaboratively it was not currently top of their institution's priorities [...] Indeed the group decided that now was not the time to commission a scoping study to look at this further."
As measured and rational as this conclusion may be, this 'wait and see' strategy strikes me as somewhat risky. Given what we as a community know about circulation rates (low), collection overlap (significant), and lifecycle management/opportunity costs (high), there is a strong argument for immediate action. Seen from a certain angle, libraries are expending scarce resources for very little return, and, at least in the US, this has not escaped the attention of administrators outside the library. That is not where we want the impetus for change to originate. We want the future of shared print collections to be shaped by library values. If we want enough time to assure that 'doing no harm' (to the collective collection) remains our top priority, it may be better to start now. As Dylan says, 'it's not dark yet, but it's getting there.'


[Photos from UKRR Website: 'Strategic Management of Monographs Discussion Forum' page]

Wednesday, May 30, 2012

Talking with Faculty about Library Collections

Earlier this month, I made my way to Maryville, MO, home of Northwest Missouri State University. It's a lovely drive north from the Kansas City Airport, more and more rural over the course of 70 miles. (Some of you may know it from the 'Brick & Click' Library Symposium held there each fall.)  Even locals describe Maryville as "in the middle of nowhere," but Northwest's B.D. Owens Library might be better described as "in the middle of the action."
 
iPlace: blurring the line between library & classroom
Over the past two years, Dr. Leslie Galbreath, Director of Academic and Library Services, has spearheaded a transformation of the Library's services, spaces, and collections. At peak times, the Owens Library now resembles a busy classroom, with groups of students huddled together around tables and laptops, covering the many whiteboards with outlines, graphs, and diagrams. The Library also houses several academic support departments, including a Talent Development Center, a Writing Center, a Proctoring Center, a Computer Lab, a Center for Information Technology & Education, and a collaborative workspace known as the iPlace. Northwest is seeking and apparently achieving a very close relationship between the Library and the University's teaching & learning services.

"Where Learners and Resources Meet"
Some of the space needed for these services was reclaimed from little-used print journal and reference collections, which have been removed and sold or recycled. Stacks on the first floor were reduced in size and number. Freed space was redesigned based on close observation of user behavior and many conversations with students. New designs include reconfigurable furniture, collaborative study space, casual spaces, individual study rooms and a quiet floor, a popular reading collection, and productive relationships with the academic support centers. Students are expressing their approval with their feet, driving  a 58% increase in gate count over the past two years.

The Owens Library: a happening place
This success raises new questions about how collections--especially print book collections--should be managed in future. At times, there is still more demand for seats than can be met. Meanwhile, as in most libraries, Owens circulation transactions are low and declining. Users largely prefer electronic resources. Like most universities, Northwest will continue to face budget constraints for the foreseeable future. More clear thinking and hard decisions are needed: about the balance between collections and curriculum, collections and budget, collections and user preferences, collections and space. To address these issues, the Library has begun to develop its first formal collection management plan.

Because some weeding has already taken place, Northwest has bought itself enough time to build its plan carefully. The Library's 200,000 or so books occupy shelves on two floors; most are at or below the 75% capacity recommended for efficient operation. The stacks are in excellent shape, with some room for growth, but also with potential for consolidation. Because space pressure is not the main driver at the moment, there is time for a campus-wide conversation about the future of collections. Even more importantly, that conversation can begin by focusing on user needs and budget realities rather than stack space. The Library also has time to engage its stakeholders. In fact, that was why I was invited to Maryville: to kick off a discussion with teaching faculty and librarians about the future of collections.

Here's an impressive fact about Northwest Missouri State's faculty: nearly 30 of them showed up on what was technically a day off to spend several hours thinking about library collections. The morning consisted of  two presentations, each followed by a fairly active discussion. My partners and I at Sustainable Collection Services (SCS) created these sessions in the course of developing our own business, and evolving our thinking about what we call "actionable collection intelligence."  The first, called "Rethinking Library Resources", outlines why we need to reconsider current practice: space pressure, low circulation, digital archiving, high levels of print redundancy, and the viability of shared print collections. The second, on "Data-Driven Deselection", describes how print collections might be drawn down safely and cost-effectively, working with a library's or consortium's circulation and holdings data.

For the most part, faculty have not previously heard these issues framed in this manner.This was the third time I've had an opportunity to speak directly to faculty. Reactions to the message vary, but the conversation is always interesting.

Rick Lugg of SCS talks about the future of print collections with NW Missouri faculty & librarians
At Northwest, several specific concerns surfaced as I suggested that print collections could continue to be drawn down with negligible impact on users. I've generalized these somewhat, to incorporate comments from discussions with faculty in other institutions:
  • Collection use is a flawed metric. This objection has surfaced in every discussion I've had with faculty. The fact that a book circulates does not necessarily make it more valuable than a book that does not circulate, only more popular. Circulation alone should not determine whether an item stays or goes.
  • In-Library use is under-counted. This is another common (and valid) comment. Users consult many books while working in the library, but most are not checked out. For libraries that do re-shelving counts, this use can be captured. Estimates of in-house use vary widely, with some libraries reporting 10 in-house uses for every circulation; a more common estimate is 2-3. We have no disagreement here. Re-shelving counts should be incorporated into use data whenever they are available.
  • Qualitative measures are more important than quantitative measures. All uses are not equal. Sometimes a work is "essential" to an argument or piece of research. These should be weighted more heavily. Some books are better than others. The library should keep the best books on a topic, not the ones that happen to be used the most. Conversely, some books are poorly researched or written, and should be removed on that basis, regardless of use.
  • A library must have books. Someone referred to this as 'books for looks'. Students need to know what it feels like to be surrounded by books, and to witness the extent and value of the scholarly record. A library collection represents that value tangibly.
  • Accrediting bodies and curriculum committees require books. While this literal interpretation of library resources is no longer true in most disciplines, the perception remains that these bodies care about how much is published -- and collected -- in their disciplines.
  • eBook versions are not always adequate. They work well in some disciplines but not in others. A History professor noted how difficult it is to take notes from/in eBooks. On the other hand, the ability to mount chapters for use in online classes is important.
  • Books have artifactual value. Some books have value over and above their content. A sequence of editions, the use of a title over time, and the object itself may warrant retention.
  • Shared print involves delays. Northwest is fortunate to be part of MOBIUS, a statewide resource-sharing network with a well-developed delivery infrastructure. But even a relatively short 48-hour wait for delivery can disrupt research, where a local copy on the shelf would allow it continue uninterrupted. ILL can take even longer. While a library can't hold everything on its shelves, more is better.
All of these points warrant consideration. As I've noted in a previous post, data-driven deselection can only be as good as the data. We should do our best to create the fullest picture of collection use. We should attempt to develop and implement qualitative measures, drawing from core lists, award, key authors, and faculty recommendations. But this has to be balanced with the fact that title-by-title consideration is simply not possible. We need techniques that rely on data, rules, and patterns. Faculty input can make collection management better, and their comments should give us pause, and cause us to adapt. But they should not stop us entirely.

NW Missouri State faculty & librarians prepare to visit the stacks

Monday, May 14, 2012

A Spa for Books

To paraphrase Jon Landau from 1973, I have seen library collections' future, and its name is ReCAP. Last week, my partner Ruth Fischer and I had occasion to visit the Research Collections and Preservation Consortium, a high-density offsite storage facility owned jointly by Princeton, Columbia, and New York Public Library. It's always a pleasure to see a well-run operation, and Executive Director Eileen Henthorne and her crew have really honed this one since it opened in 2002. 

10 million books in there!
Facts and figures
  • Current capacity: 10 million volumes (96% full)
  • 2 new modules under construction will hold another 8-10 million volumes
  • On an average day, ReCAP takes in 3,000 items/day from its libraries; at peak load-in, this reached 8,000/day.
  • 600-700 items per day are retrieved to fill patron requests; 24-48 hour fulfillment
  • Annual retrieval rate is under 2%, indicating that the right items have moved offsite
  • ReCAP has never lost an item!
  • Climate is controlled at levels supporting 300-year preservation: 50-59 degrees; 35% relative humidity. In the colorful phrase of Jim Neal, Vice Provost at Columbia, ReCAP is "a spa for books." The facility is powered by solar panels on its roof.
  • The facility is as clean, organized, and efficient as the library of our dreams
  • While it is not directly browsable, it is immensely reliable and useful. This approach, however poorly it fits our romantic view of libraries, is exactly what we need to manage low-use tangible collections.
A Brief Tour in Pictures

ReCAP processing room


Incoming volumes are sorted by size
and placed in one of 16 sizes of acid-free cardboard trays

Barcodes on book, tray, and shelf manage the inventory--there is no bib data in the ReCAP system
After sizing, volumes are accessioned by scanning barcodes on piece and tray--system assigns a row/shelf

The entire accessioning process is repeated by a different worker to assure accuracy
Trays are then placed in the storage modules 
Every volume has a home address of several barcodes
Bowling alley wax is used on shelves to assure smooth movement of trays
Executive Director Eileen Henthorne & her staff know where *everything* is
Tools of the trade: barcode readers re-charging


Cart with forklift slots in base 
Needed because shelves are 30' high




Narrow-aisle picker vehicle with carts, trays, and visitor aboard

Books enroute from ReCAP to users: 24 hours from request to delivery
New modules under construction

Tuesday, April 17, 2012

Measuring Collection Use

Despite Neil Young's warning that 'numbers add up to nothing', sometimes they have to suffice. In the data gathering that underpins our deselection and shared print projects, we at SCS spend a lot of time looking closely at circulation statistics. Several factors influence what can be gleaned here:
  •  ILS: Different library systems capture and store circulation data in different ways.
  • Duration: Most libraries retain circulation data back as far as their last system migration, though a small percentage port historical checkouts over as part of the data transfer process.
  • Definitions: Often a checkout is just a checkout. This usually includes direct borrowing and ILL transactions, but not always. Some libraries also use checkouts to monitor workflows, charging books out to Acquisitions, Cataloging, Bindery, etc.
  • Transaction dates: Some systems capture only circulation totals. In others, it is possible to learn the date of the most recent circulation transaction.
Not to put too fine a point on it, there are no standards for circulation data. While this imposes some limits on the analysis that can be done for an individual library, it can be even more problematic in shared print projects, where it is necessary to derive a common basis for circulation activity. Even with its limitations, though, circulation data provides an important objective measure of collection use.

No one would argue that circulation activity constitutes the full picture, of course. As Walt Crawford and Michael Gorman point out in Future Libraries: Dreams, Madness and Reality: "To be effective politically, it is vital to record the totality of collection use." [WorldCat record]. This is easily confirmed in conversation with faculty and other library users, who often assert with some vigor that circulation represents only a partial picture of use. They base this, reasonably enough, on their own habits, and there is some quantitative support for this in cases where statistics for in-house use are available.

In one library we worked with recently, a 2-month sample of reshelving counts indicated that ten in-house uses occurred for every circulation. That is the highest number we have encountered. A more commonly-reported level is echoed in Future Libraries: "When libraries have counted in-library use, usually omitting pure browsing, the numbers are 2-3 times as high as actual circulations."

Especially when deselection is being considered, recognition of any and all use is important. The last thing we want to do is remove something from the shelves that is actually used. How can we be certain we're getting the fullest picture? 

The obvious answer is to institute or re-institute reshelving counts. This is a frustrating conclusion. It seems completely counter-intuitive and backward-looking to add this sort of work to a library's daily operations in the digital era. Reshelving counts are time-consuming, and require cooperation from users. They shift the focus to print when all signs point toward declining use and value.

But we need the data. We are beginning to make long-term decisions about the future of print collections, and we need those decisions to be as informed as possible. This means we need to capture in-house use--at least in some form.

The simplest and most accurate approach is to provide and promote regular reshelving counts--to identify and count all books used in the library. Stacks workers can scan the barcode and either tally in-house use separately or count it as a checkout. This approach ties use statistics to specific titles, which is optimal.

Some libraries will find this investment hard to justify or sustain. In those cases, a sampling approach could be adopted. Adopt reshelving counts for one week per quarter or one week per semester. This less labor-intensive approach could provide sufficient data for extrapolation; i.e., to calculate an estimated rate of in-house use that could augment actual circulation statistics.

Wednesday, April 11, 2012

Art Books: A Special Case?

Imminent weeding, storage, or transfer projects often prompt vigorous discussion about the value of local print collections. As we’ve considered in a recent post, Browsing Now, the prospect of losing direct, hands-on access to books is of particular concern to students and scholars in the Humanities. They argue that library stacks constitute the ‘laboratory’ for their disciplines, and that the ability to browse onsite collections is essential to their work. There are pros and cons to this position, but at heart it asserts that the values of ‘library as place’ and ‘library as collection’ are tightly linked. We’ll continue that debate another day.

Today’s question is related but narrower. Are there disciplines that warrant special treatment—i.e., exemption from weeding, storage, sharing, or consolidation—because the characteristics and use of their literature are different? In other words, are there subjects where locally-held print books are so superior to the alternatives that they must be retained in situ? Let’s consider a prime contender for exceptional treatment: books in Art and Photography. And let’s consider a current real-world example.

Wesleyan University has decided to close its Art Library and move the 25,000 books now held there to the Main Library. There are good reasons for this, including the fact that this move will unite a collection that is now split between the two buildings. These are well articulated in a recent article in the student newspaper, and in the excellent WesWeeding blog maintained by the Library. 

Still, faculty and students are concerned. This is a substantial and concrete change, which will directly affect convenience and user work habits. University Librarian Pat Tully and her staff have kept all activity and dialogue transparent, and have managed to engage the campus in a productive discussion of this difficult topic. And while all disciplines will be affected by this move (since the main library itself must be weeded to accommodate the transferred art books), art students and faculty will face more change than most.


It’s important to note that there are really two separate components to this change. One concerns the loss of a specialized, conveniently-located branch facility, close to studios and classrooms. In a sense, a branch library of this sort is embedded among its primary users. Its value as an informal community center built around shared interests is clear, but this is not unique to Art. A branch library for Physics, Education or Music offers the same advantages. The presence of relevant library resources may enhance this embedded environment, but social and collegial activity would occur—and will continue to occur-- even without shelves full of books. Since this is not fundamentally a collections issue, we’ll set it aside for now.

For our purposes, a second set of questions is of more direct interest. Are art books actually different than books in other subjects? Are books in art and photography used differently than books in other disciplines? And if so, what should we do about it? Part of my curiosity here stems from observation of my step-daughter Emily’s library use. Initially interested in anthropology, she mostly used the library’s online resources. When she became a studio art major, however, stacks of print books began to appear on her coffee table with some regularity. Why?
  • Art books are different. Obviously, books in art & photography are more likely to include images. Print images are typically of higher resolution and greater fidelity than digital images. A recent Slate article by Jim Lewis notes that "a well-produced photography book might get as high as [...] 600 dots per inch, [...] about 8 times finer than an Apple monitor. The result, especially on high-quality paper, is much greater detail and a much subtler range of tones." Even if digital images were of comparable quality, however, many are simply not available in that form. As publishing consultant Emily Williams notes in the Digital Book World blog, "books are complicated bundles of copyrights." Just because a publisher has the print book rights to an image doesn't mean it has the digital book rights. In short, print books still rule in the arts.

  • Art books are used differently. Because image quality is so important, art books are used not only for close examination of a work, but also in support of studio assignments. Art students often bring books into the studio, propping them open next to their easels for inspiration or to complete an 'in the style of' assignment. While similar portability may be possible with a PC or an iPad, the quality of those images is likely to be unsatisfactory. So at least some art books become tools for use in the studio. 

  • So what should we do? The simple answer is to treat art books more conservatively. For deselection, this may take care of itself, since print books in art do tend to circulate more actively than some other disciplines. Because of rights and resolution issues, the transition to eBooks will be slower, so most libraries will continue to purchase new print books in art. But we might benefit from monitoring user behavior more closely. This could include instituting re-shelving counts, to capture in-house use. It would also be useful to know when the quality of digital images reaches the point where a tablet replaces a book propped up next to the easel. 
Caution: Images At Work

Monday, April 2, 2012

Practicing Collection Management


For the past 12 years, my partner Ruth Fischer and I have consulted for academic libraries on workflows and organizational redesign.  One unexpected result of that experience is that I became deeply uncomfortable with the concept of ‘best practices.’ To perform well, systems or organizations must continually adjust to changing conditions. And while disciplined attention—i.e., practice --  is essential, no practice fits every organization. There are always local realities that must be accommodated. The process of adaptation is never complete. There is always more to learn; the environment remains dynamic.

For an individual organization, then, there are no best practices. There are only good practices, modified to fit a specific set of circumstances, always with one eye on the future, the budget, and previous investments. At most, these are best possible practices.

All aspects of library work are changing rapidly. Within the sphere of collection management, the very concept of 'collections' is under scrutiny, as electronic resources dominate, as patron-driven acquisitions gains traction, and as library space is wanted for other purposes. The value of local print collections is changing, as we consider rates of circulation and in-house use, and as our awareness of redundancy and life-cycle management costs grows. Paradoxically, this makes the practice of collection management more important--and interesting-- than ever.

That's why it's always heartening to see good work in progress. A case in point: in preparation for a recent visit to Colgate University, the Library sent me a copy of its Collections Management Working Group's Final Report.

The Case Library and Geyer Center for Information Technology, Colgate University

University Librarian Joanne Schneider formed the group in June 2011 to address emerging space and collections challenges. Colgate had opened an automated storage & retrieval system (ASRS) in 2007. Named LASR, it now holds 375,000 volumes, and was designed to provide capacity for 30 years' collection growth. (This September 2011 Library Journal article provides a fine description of the Colgate implementation.) While overall shelf and storage space remains ample, the distribution of print materials across subjects and locations has created some unexpected congestion in certain areas. In particular, the shelves which house LC Classes A-H are currently at 84% of capacity. As described in a previous post, 75% capacity is considered optimal. Even when adequate resources are available, the process of adaptation is never complete.

The Working Group's analysis and response strikes me as an excellent piece of work, adapting the Library's collection management plan to its changing realities. A few highlights (paraphrased with the Library's permission) suggest the intelligence of the group's approach: 
  • Cross-functional membership: the Working Group was chaired by the Head of Collection Development, and included representatives from Reference, Cataloging, Government Documents, Stacks Management, and the LASR facility. All perspectives and workflows had a voice.
  • Balancing of priorities: stewardship of collections, space, and the user experiences all carried weight in the group's recommendations.
  • Projections of collection growth: while it is enormously difficult to predict the future, it is important to try. The Working Group made thoughtful assumptions, looking toward the next 15-20 years, working in 5-year increments, in several areas: 
    • Collections budgets will remain flat or near flat
    • Electronic resources will continue to claim a greater share of materials money
    • E-book adoption will be slower than was predicted as recently as 2010
    • There will be more reliance on regional shared print strategies
    • Regular weeding and transfers will help control collection growth 
  •  Projections of collection capacity: At a macro level, Colgate's LASR has room through 2030. But the details of 12-inch bins vs. 10-inch bins, monographs vs. journals, the dispersion of print resources in open stacks, and decisions about government documents will all affect how and when that space is used. Even if collection capacity is adequate for the long-term, Colgate believes weeding still has a role, especially as shared print efforts become more common.
  • Workflow Awareness: Transfers to and from an ASRS take time and effort, as does large-scale shifting of collections in open stacks, once weeding or transfers are completed. In long-range thinking, it is important to minimize unnecessary materials movement and record maintenance. 
  • Weeding Criteria Defined: For books, the Working Group agreed on specific--and relatively conservative criteria. Nonetheless, 132,406 potential withdrawal candidates were identified:
    • Low-use: used 0-1 times [total checkouts <2]
    • Relevance: not checked out in past 15 years; not on reserve in past 10 years
    • Age: 20 years or older
    • Available from 2 or more consortial partners
    • Active items: not suppressed, missing, billed, etc.
    • Main book collections: not Special Collections, reference, etc
    • Multi-volume works: excluded from consideration
  • Weeding Criteria Adjusted: Initial estimates were reduced by 20% to account for Colgate's uniquely-held titles within its consortium, and other reasons to retain (what we at SCS call 'title protection rules'). The group also recommended that its consortium agree in principle to a last-copy policy before relying on it as back-up for weeded materials.
  • Transfer Criteria Defined: Because titles in this category are going from stacks to LASR, criteria are looser -- the books will remain in the building. These criteria yielded 48,166 items.
    • Use: less than 6 total uses
    • Relevance: Not checked out in past 10 years; not on reserve in past 5 years
    • Age: 15 years or older
    • Multi-volume works: excluded from consideration
  • Manageable Timelines: The workload generated by the Group's report is formidable. They suggest staging the work over several years, with specific ideas about sequencing and load balancing. They envision collection management of this sort as an ongoing process, on regular cycles of five years -- or integrated into annual workflows in smaller increments. 

The Group's report concludes with a statement of the benefits expected if their ideas are adopted. There are, of course, no guarantees that all of their assumptions and reasoning are correct. Everything may be subject to change. But this exercise, adapted for local use, is one that many libraries could benefit from. And it's the regular attention to these matters, not perfect results, that defines the practice of collection management.

Tuesday, March 6, 2012

MCLS and SCS

Remember these two acronyms: MCLS and SCS.

MCLS Offices in Lansing
Midwest Collaborative for Library Services (MCLS), based in Lansing, Michigan, provides libraries in Michigan and Indiana with a range of services, from group licensing and training to convening and facilitation. And in this case -- innovation.

  Sustainable Collection Services (SCS)-- based in Contoocook, New Hampshire; Watertown, Massachusetts; Portland, Oregon; and with major operations in the cloud -- provides decision-support tools and services for deselection and shared print collection management. And in this case -- innovation.

Over the past six months, SCS and MCLS have collaborated on a unique pilot project: developing a shared print monographs program across seven Michigan academic libraries. Print book collections in these pilot libraries vary in size, from 160,000 to nearly 1.2 million titles. Participants are small, medium, and large state universities, including one ARL library. Some are confronting immediate space problems; some are not. But all seven libraries see long-term value in collaborative management of print book collections. In alphabetical order, these forward-looking libraries from the Wolverine State are:

Central Michigan University
Eastern Michigan University
Grand Valley State University
Michigan Technological University
Saginaw Valley State University
Wayne State University
Western Michigan University

You'll be hearing a lot about this project and these libraries in the coming months, and for good reason. MCLS and SCS have pioneered -- and implemented -- a practical shared management solution for low-use print monographs.

Working closely with SCS to compile and analyze their combined collections data, the MCLS pilot group identified 534,000 low-circulation 'title-holdings' to be considered for withdrawal from their collective shelves. (A title-holding is SCS terminology for a library-specific holding of a title held by multiple pilot libraries.) For these same titles, 2 title-holdings of each will be retained within the group. The 534,000 allocable withdrawal candidates were identified based on these criteria:
  • 3 or fewer circulations since 1999
  • Held by 3 or more pilot libraries
  • Published or added before 2005

Allocation of withdrawal candidates (and corresponding assignment of retention commitments) among seven libraries proved a complex process. The desire to withdraw title-holdings with the fewest circulations had to be balanced against the withdrawal targets of those libraries needing space immediately. The relative size of collections also had to be factored in. Equity had to be defined and assured. After 15 iterations (!), SCS established an allocation algorithm that largely satisfied these objectives. (We'll be protecting *that* like the original formula for Coke!)

SCS is now producing lists of withdrawal candidates and retention commitments for each library. The lists will be completed by mid-March, and pilot libraries will be free to act on both fronts, in tandem with the Memorandum of Understanding now under construction. In its simplest terms, this project represents a clear example of the power of collaboration on shared print book collections. Working together provided both more collection security and more opportunity for withdrawals than any library acting alone.

This project was initially suggested by Doug Way, Head of Collections at Grand Valley State University. His idea surfaced after a standalone SCS  deselection project at GVSU, when it became obvious that cooperative action within Michigan would lead to a more comprehensive and stable regional solution. Randy Dykhuis, Executive Director of MCLS, agreed to solicit participants, mediate business arrangements and provide coordination, communication, and facilitation. SCS agreed to undertake development of the requisite tools, data,  and services, and in July 2011 began loading data from all seven libraries.

We have learned a great deal in the intervening months. MCLS, the pilot libraries, and SCS all plan to speak and write about the process and results. [Contact us if you're interested in a presentation.] While we faced several operational and organizational challenges. we have mostly wrestled those to the ground. With minor reservations and many ideas for improvement, all parties consider this phase of the project a success. We are eager to tell the story, and help other groups move forward. For now, a few highlights:
  • 3.8 million bib records, plus circulation and item data, were extracted from two different ILS systems, loaded to SCS servers, normalized, and compared to WorldCat & HathiTrust.
  • Comparable circulation data existed for an 11-year period: 1999-2011. 1.74 million title-holdings (46%) did not circulate during that time. Normalization of circulation data is a major challenge.
  • 1.36 million unique titles (36%) were held by the group. Even the smallest library held more than 40,000 unique titles.
  • 989,000 titles (26%) were held by 4 or more pilot participants.
  • 2.36 million (62%) title-holdings showed more than 100 WorldCat holdings. 2.93 million (77%) showed more than 50 WorldCat holdings.
  • 1.57 million (41%) title-holdings were HathiTrust in-copyright titles. 131,00 (3%) were HathiTrust public domain titles.
    This data, combined and recombined in various ways, allowed MCLS pilot libraries to gauge the impact of different deselection scenarios. The group has proceeded cautiously but steadily, breaking a new trail for themselves and others who decide to follow. Group decisions have occurred relatively quickly, in part because the choices rest on clear data. (This is especially encouraging, since this was an ad hoc group, formed specifically for this project.)

    We experienced (and at times created) data errors, but each in turn has been corrected, clarified, or restated as necessary. There are still policy issues to be worked through, especially around retention commitments. But together, MCLS, the pilot libraries, and SCS have made solid progress in creating an infrastructure for the shared management of print monographs.

    The whole project has been and continues to be a remarkably pleasant experience. So, from the Granite State, the Bay State, and the Beaver State (components of the virtual SCS organization) to our collaborators in the Wolverine State, we say thank you for living up to Michigan's motto: "Si Quaeris Peninsulam Amoenam Circumspice" ;-) Or to paraphrase the translation of another Latin saying on behalf of the MCLS-SCS enterprise: 'We came. We crunched. We concurred.' We look forward to the next steps.

    Thursday, March 1, 2012

    Data with Benefits

    Initial batch data extract from library
    Sustainable Collection Services (SCS), the company I run with three business partners, provides decision-support for print monographs deselection. SCS processes are built on data and batch processing. We first import the library's bibliographic, item, and circulation data. That data is then normalized, cleansed, and compared to other data sets such as HathiTrust, WorldCat, peer libraries, and authoritative lists. Library-defined rules then operate against the resulting superset of data, enabling selectors or administrators to gauge the effect of different deselection criteria. Ultimately, candidate lists for withdrawal and preservation are produced.

    That is the service we planned to build. It is the service we actually have built and applied to numerous library projects over the past year. But it turns out to be only part of our business. Working with large monographs data sets also creates opportunities for validation, remediation, analysis, and batch processing. Initially, we regarded these as side benefits. Now we are beginning to think of them as integral to the overall SCS service. Consider some simple examples:

    • Missing or Invalid OCLC Control Numbers: It is fairly common for some portion of a library's cataloging records to lack OCLC numbers. In other cases, those numbers may be truncated or malformed. For records without valid OCLC numbers, SCS uses a combination of LCCN and string-similarity matching to identify likely record matches and corresponding control numbers. These can be returned to the library in a batch to enable update of its catalog. 
    • OCLC Holdings Not Set: As SCS queries the WorldCat API to look up summary and peer holdings, it becomes apparent that in some cases the library's own holding has not been set. We can report these instances to the library, and produce a list that enables batch holdings update--sort of a miniature reclamation project.
    • Profile of a Group Collection: In one recent project with a pilot group of seven libraries, SCS identified uniquely-held titles for each participant, as well as the degree of overlap on all others. Combined with corresponding circulation data, this enabled identification of a sweet spot for shared print commitments. There are many possibilities in this area.
    • Print/E-Book Overlap: Provided SCS has a library's records for both print and electronic books, it is increasingly possible to determine whether a low-circulation print title is also held as an e-book. There are many caveats here (e.g., it may be important to distinguish whether the e-book is owned, rather than simply available as part of a package or a patron-driven acquisition record). But this overlap is of interest to many libraries.
    • FRBR-on/FRBR-off: Edition matching is a critical element of deselection and collection analysis. For archiving and preservation purposes, exact matches are imperative. For user purposes, exact matches are sometimes important and sometimes not. SCS holdings lookups start with FRBR groupings off, a conservative approach that assures edition-specific matches. For titles that return few holdings, we then re-run the lookups with FRBR groupings on, returning these "softer" matches to the library for review.
    • Batch Processing Support: Deselection projects create record maintenance work, regardless of whether titles will be transferred or withdrawn. Some record maintenance steps (e.g., suppression, location changes) can be completed as batch processes, based on lists that include local control numbers and necessary data elements. Often, SCS can produce labor-saving lists of this sort from the data we hold. 
    Remediated batch of data enroute to library...
    In each project, we encounter additional opportunities to derive new value from the data. In shared print projects, for instance, it will increasingly prove useful to highlight retention commitments as well as withdrawal opportunities. These are most efficiently handled as batch processes. As always, we are limited only by the data itself and our own creativity. We will continue to look for more ways to benefit from the effort that goes into deselection projects. Some solutions may be partial in scope, but in a large data set even partial solutions can save many hours of staff time.

    Tuesday, February 21, 2012

    Bibliometrics and Book Retention

    As I've stated in other contexts, selection and deselection represent the same intellectual activity, performed at different points in a book's lifecycle. Deselection has one significant advantage, though. It can be based on a track record of circulation, in-house use, and appearance on authoritative lists. We began to explore yet another type of historical evidence in a previous post on The Impact of Books: citation counts. Although it seems reasonable to presume that the number of citations to a book would correlate with discovery and use, we need a deeper understanding of the underlying dynamics. Highly-cited books seem likely to be important books, books worth keeping, books more likely to be wanted in future.

    Bibliometrics "uses quantitative analysis or statistics to describe patterns of publication within a given field or body of literature." Not surprisingly, bibliometric techniques originated in the hard sciences and in the journal literature, but they are now used in many disciplines and increasingly on monographs. Historically, citation analysis has been used to evaluate researchers and departments, and to gauge the impact of a contribution to its discipline. Our purposes are related but somewhat narrower. We are seeking to identify high-impact books within a discipline to assure that they are retained. Can bibliometrics help identify these titles? What can citation patterns tell us about how intellectual content ages in specific disciplines?

    Conceptually, this turns out to be a rich vein. and the literature and data run deep. Consider some of these potential data points:
    • Total number of citations: a straightforward measure of citation frequency. However, it may be useful to distinguish between journal-to-book citations and book-to-book citations. The former can be easily (though partially) retrieved through journal indexes. The latter are beginning to be identified using Google Books and Hathi Trust, but at present are largely unavailable.
    • Average citation frequency: Number of citations per monograph in a discipline. Used to compare activity among disciplines.
    • Citation peak: date after publication at which the maximum number of citations occur.
    • Noncitation ratio or Uncitedness Index: absence of citations in a defined time period.
    • Price's Index (citation recency): "calculates the proportion of the number of citations no more than five years old over the total number of citations an item receives."
    • Half-life of citations: a measure of "obsolescence" of scholarly literature, obtained by "subtracting the publication year of source documents from the median publication year of citing documents."
    • Reference decay: the point after which 90 % of citations to a work occur.
    There are obvious implications here for monographs deselection and retention. These measures provide one kind of insight into the impact and staying power of individual works. They also enable identification of content aging patterns at the disciplinary level, especially when examined by the periods of "knowledge diffusion" or "intellectual acceptance" developed by Lindholm-Romantschuk and Warner (in "The Role of Monographs in Scholarly Communication: An Empirical Study of Philosophy, Sociology and Economics"). These periods are:
    • Initial Reception: "the period of three calendar years from publication (including the year of publication).
    • Intellectual Survival: the number of years after initial reception that a book continues to be cited.
    In an eye-opening 2008 article entitled "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs" Professor Rong Tang of Simmons College employs a number of these concepts to "explore disciplinary difference in the citing of books." Her work centers on 750 randomly selected monographs, 125 each in Religion, History, Psychology, Economics, Math, and Physics. The study seeks to answer two research questions:
    "Are there significant domain or disciplinary differences in the distribution of citations to monographs, half-lives, and Price's Index?"
    "If conditioned on the periods of intellectual acceptance, are there significant differences among disciplines in terms of citation frequency and number of books cited per period?"
    The article presents its methods, concepts, and results clearly. It is well worth reading in its entirety. The table reproduced below begins to show the potential variability across disciplines:

    Rong Tang, "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs"

    Some of its more surprising results include:
    • Psychology received the highest number of citations, with more than 6,000 and an average of 48.1 citations per monograph, followed by math and physics. History received an average of 3.2 citations per item.
    • Physics has the longest half-life, while humanities disciplines have the shortest.
    • The highest uncitedness ratios occurred in history (52%) and Religion (59%).
    • "...the peak time of citations for six disciplines all occurred within the first 20 years of publication."
    • "Religion and history reached their highest citation amount within the first five years...whereas psychology, physics and mathematics did not receive their citation heyday until more than six years after publication."
    • Citations of most disciplines increase at six years after publication. "The highest potential period of intellectual acceptance is the first 10 years, with the decline and gradual ending of citations during the 11th to 30th years...
    It will take time and experimentation to evaluate to determine how applicable some of these ideas and findings may be to book retention decisions. The results need to be qualified: the sample size was small; it considers only article-to-book citations, not book-to-book citations, which may under-represent humanities citations. But the article provides an excellent foundation. A hearty thanks to Professor Tang and predecessors for providing this useful framework.

    Wednesday, February 15, 2012

    The Impact of Books


    Effect of heavily-cited monograph
    During a recent monographs deselection project, an astute librarian inquired whether a book's "impact factor" -- the number of times it has been cited in other books or journals -- might be invoked as a title protection rule. Impact factor, of course, is a concept much more highly developed for journals and conference proceedings than for monographs. Often described as a quantitative tool for evaluating journals, impact factor captures the frequency with which an article has been cited in a three-year period. At the journal title level, it captures the average number of citations per paper. Results are published annually in Journal Citation Reports. While not without controversy as a performance metric, impact factor is widely used as a shorthand indicator of article and journal quality.

    Recently, an impact factor for books has begun to receive some overdue attention. In late 2011, Thomson Reuters introduced the Book Citation Index, available through its Web of Knowledge platform. Despite its bold taglines of "putting books back into the library" and "completing the research picture", it represents a fairly modest beginning. By December 2011, it was projected to include 30,000 titles, with a plan to add 10,000 per year. The Thomson Reuters site describes a careful selection process, and highlights improved discovery and citation navigation as the Index's primary attributes. But there is a clear implication that these are important monographs in their respective fields.
       

    This implication is not without controversy. Metrics such as citation analysis raise the hackles of some researchers, especially in the humanities and social sciences, as shown in a lively exchange of comments following this article from Times Higher Education: "Monographs finally join citations database."  On October 13/14, 2011, a Mr Flannigan let it be known that:
    "The field of citation counting isn't a 'field' in any intellectual sense. It's a shortcut; an attempt to evade engagement with intellectual content and reduce everything to the logic of a spreadsheet."
     "I don't doubt that some disciplines might benefit from citation counting. But I'm sick of scientists imposing their methods onto non-cognate disciplines and demanding that everyone else fall into line."
    Several recent articles further explore book and even chapter-level impact using sources other than BCI. "Assessing the citation impact of books: the role of Google Books, Google Scholar, and Scopus", published in November 2011, examines whether these databases can provide "alternative sources of citation evidence", and specifically looks at references to and from books. Planned data mining of the Hathi Trust corpus may open up some new avenues. A 2006 account of a pilot project for the Australian Council for the Humanities, Arts, and Social Sciences tests the extension of citation analysis to books in history and political science:

    Source:Linda Butler, Council for the Humanities, Arts, & Social Sciences
     
    We'll follow up on these and other recent works on "bibliometrics" in a subsequent post. (Mark your calendars for that!) For now, let's assume that book impact factors are worth some consideration in decisions about storage, withdrawal, and retention.

    As monographs are considered for deselection, there is often a desire to exempt titles that appear on "authoritative" lists or core lists, regardless of whether those titles have been used. Examples include titles listed in Resources for College Libraries or as CHOICE Outstanding Academic Titles, or on discipline-specific accreditation lists. Clearly, titles listed in the Book Citation Index could fall into this category, and might be considered candidates for retention irrespective of other considerations, even as the debate about citation analysis continues.

    There is one very practical problem, however. Book Citation Index, as currently constituted, is limited to books with copyright dates in the current year plus 5 previous years in the Sciences, and current year plus 7 previous years in Social Sciences and Humanities. As this is written in early 2012, then, coverage includes:
    • Sciences: books published in 2007 or later
    • Social Sciences & Humanities: books published in 2005 or later
    To date, deselection criteria in the projects supported by our firm Sustainable Collection Services have focused on titles published or acquired before 2005--sometimes much earlier. The universe of titles being considered for withdrawal and the universe of most-cited titles in Book Citation Index at present do not overlap at all. For now, impact factor simply cannot play a role in deselection decisions. The relevant data does not yet exist in any consolidated form.

    As the list of titles grows over time, it will become more relevant. But the role of book impact factor in deselection will emerge only as titles published in 2005 and later begin to appear on withdrawal candidate lists. The utility of the impact factor will grow incrementally; under the Book Citation Index model, 10,000 additional titles will be available for analysis each year. In five or ten years, this may be an important data point. But not quite yet. In fact, it may not be necessary at all, since presumably highly-cited books would tend to receive more use. And in deselection decisions, use trumps most other considerations.

    Wednesday, January 25, 2012

    Browsing Now (2)

    Browsing and serendipity are not limited to the book stacks. Skimming and scanning are habits of mind, and can lead to unexpected discoveries anywhere. Like millions of other people, I use Twitter to bring a mix of relevant and entertaining content to my attention. While Twitter's brief messages and links rarely include books, they do provide a loosely-shaped browsing experience that often leads to useful information I might not find otherwise.

    On January 12th, 2012, a small snapshot of my Twitter feed included the following.
    @lorcanD: "NYT Windows phone app is very nice, while the Guardian's is lazy."
    @ChuckProphet [musician]: "Sometimes Christians are so mean."
    @GreatDismal [author]: "Signed Hungarian completist's amazing collection of my work in Hungarian. Many rarities I hadn't seen before."
    @GreatDismal: "Sorry I wasn't tooled up to sign tablets." (A problem later rectified--above).
    @latimes: "James Joyce moves into the public domain, mostly." [link]
    @lorcanD: "the future of collections and collections management. interesting pres by Caroline Brazier of BL. ppt" [link to powerpoint]
    In less than a minute, I gleaned several unexpected thoughts (autographing books is changing), developments ("Joyce's unpublished work, particularly his letters, will [now] be available to scholars"), questions (there are people who use Windows phones?) and two substantive links, without actively searching for any of them. Echoing David Weinberger's characterization of the web, these were "small pieces, loosely joined."

    So browsing itself continues to advance and morph, as do the formats and content found. Pictures, news blogs, opinions, observations by interesting artists, even an occasional book. But the prize of this day's group of links proved to be the slides from Caroline Brazier's presentation on "Collect/connect: the future of collections and collections management." It's a substantive exposition of the changes facing libraries, prepared to inform the British Library's 10-year strategy. But it also showcases some additional attributes of professional information and grey literature in the early 21st century. Ms. Brazier's work is:

    • Timely. Delivered on October 27, 2011 in Adelaide, Australia.
    • Authoritative. Authored by the Director of Scholarship & Collections at the British Library.
    • Linked from a trusted source. Lorcan Dempsey's tweets regularly turn up interesting targets, indicating context and format.
    • Topical. Future of tangible and digital collections, curating a discovery layer.
    • Graphical. 39 slides with minimal text. Images, tables, and graphs drive the message. (Grey literature a true misnomer here!)
    • Freely available. Like a library. Amazing how much valuable content fits this description.
    • Multi-media. A full MP3 audio is available to accompany the slides.
    • Easy to share. Links, extracts, copies.
    • Useful. A powerful new graphic for thinking about shared print.

    A browsing nugget, January 12, 2012
    • Discovered serendipitously. Scrolling through dozens of unrelated entries, far from the book stacks, far from the library.