Tuesday, February 21, 2012

Bibliometrics and Book Retention

As I've stated in other contexts, selection and deselection represent the same intellectual activity, performed at different points in a book's lifecycle. Deselection has one significant advantage, though. It can be based on a track record of circulation, in-house use, and appearance on authoritative lists. We began to explore yet another type of historical evidence in a previous post on The Impact of Books: citation counts. Although it seems reasonable to presume that the number of citations to a book would correlate with discovery and use, we need a deeper understanding of the underlying dynamics. Highly-cited books seem likely to be important books, books worth keeping, books more likely to be wanted in future.

Bibliometrics "uses quantitative analysis or statistics to describe patterns of publication within a given field or body of literature." Not surprisingly, bibliometric techniques originated in the hard sciences and in the journal literature, but they are now used in many disciplines and increasingly on monographs. Historically, citation analysis has been used to evaluate researchers and departments, and to gauge the impact of a contribution to its discipline. Our purposes are related but somewhat narrower. We are seeking to identify high-impact books within a discipline to assure that they are retained. Can bibliometrics help identify these titles? What can citation patterns tell us about how intellectual content ages in specific disciplines?

Conceptually, this turns out to be a rich vein. and the literature and data run deep. Consider some of these potential data points:
  • Total number of citations: a straightforward measure of citation frequency. However, it may be useful to distinguish between journal-to-book citations and book-to-book citations. The former can be easily (though partially) retrieved through journal indexes. The latter are beginning to be identified using Google Books and Hathi Trust, but at present are largely unavailable.
  • Average citation frequency: Number of citations per monograph in a discipline. Used to compare activity among disciplines.
  • Citation peak: date after publication at which the maximum number of citations occur.
  • Noncitation ratio or Uncitedness Index: absence of citations in a defined time period.
  • Price's Index (citation recency): "calculates the proportion of the number of citations no more than five years old over the total number of citations an item receives."
  • Half-life of citations: a measure of "obsolescence" of scholarly literature, obtained by "subtracting the publication year of source documents from the median publication year of citing documents."
  • Reference decay: the point after which 90 % of citations to a work occur.
There are obvious implications here for monographs deselection and retention. These measures provide one kind of insight into the impact and staying power of individual works. They also enable identification of content aging patterns at the disciplinary level, especially when examined by the periods of "knowledge diffusion" or "intellectual acceptance" developed by Lindholm-Romantschuk and Warner (in "The Role of Monographs in Scholarly Communication: An Empirical Study of Philosophy, Sociology and Economics"). These periods are:
  • Initial Reception: "the period of three calendar years from publication (including the year of publication).
  • Intellectual Survival: the number of years after initial reception that a book continues to be cited.
In an eye-opening 2008 article entitled "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs" Professor Rong Tang of Simmons College employs a number of these concepts to "explore disciplinary difference in the citing of books." Her work centers on 750 randomly selected monographs, 125 each in Religion, History, Psychology, Economics, Math, and Physics. The study seeks to answer two research questions:
"Are there significant domain or disciplinary differences in the distribution of citations to monographs, half-lives, and Price's Index?"
"If conditioned on the periods of intellectual acceptance, are there significant differences among disciplines in terms of citation frequency and number of books cited per period?"
The article presents its methods, concepts, and results clearly. It is well worth reading in its entirety. The table reproduced below begins to show the potential variability across disciplines:

Rong Tang, "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs"

Some of its more surprising results include:
  • Psychology received the highest number of citations, with more than 6,000 and an average of 48.1 citations per monograph, followed by math and physics. History received an average of 3.2 citations per item.
  • Physics has the longest half-life, while humanities disciplines have the shortest.
  • The highest uncitedness ratios occurred in history (52%) and Religion (59%).
  • "...the peak time of citations for six disciplines all occurred within the first 20 years of publication."
  • "Religion and history reached their highest citation amount within the first five years...whereas psychology, physics and mathematics did not receive their citation heyday until more than six years after publication."
  • Citations of most disciplines increase at six years after publication. "The highest potential period of intellectual acceptance is the first 10 years, with the decline and gradual ending of citations during the 11th to 30th years...
It will take time and experimentation to evaluate to determine how applicable some of these ideas and findings may be to book retention decisions. The results need to be qualified: the sample size was small; it considers only article-to-book citations, not book-to-book citations, which may under-represent humanities citations. But the article provides an excellent foundation. A hearty thanks to Professor Tang and predecessors for providing this useful framework.

Wednesday, February 15, 2012

The Impact of Books


Effect of heavily-cited monograph
During a recent monographs deselection project, an astute librarian inquired whether a book's "impact factor" -- the number of times it has been cited in other books or journals -- might be invoked as a title protection rule. Impact factor, of course, is a concept much more highly developed for journals and conference proceedings than for monographs. Often described as a quantitative tool for evaluating journals, impact factor captures the frequency with which an article has been cited in a three-year period. At the journal title level, it captures the average number of citations per paper. Results are published annually in Journal Citation Reports. While not without controversy as a performance metric, impact factor is widely used as a shorthand indicator of article and journal quality.

Recently, an impact factor for books has begun to receive some overdue attention. In late 2011, Thomson Reuters introduced the Book Citation Index, available through its Web of Knowledge platform. Despite its bold taglines of "putting books back into the library" and "completing the research picture", it represents a fairly modest beginning. By December 2011, it was projected to include 30,000 titles, with a plan to add 10,000 per year. The Thomson Reuters site describes a careful selection process, and highlights improved discovery and citation navigation as the Index's primary attributes. But there is a clear implication that these are important monographs in their respective fields.
   

This implication is not without controversy. Metrics such as citation analysis raise the hackles of some researchers, especially in the humanities and social sciences, as shown in a lively exchange of comments following this article from Times Higher Education: "Monographs finally join citations database."  On October 13/14, 2011, a Mr Flannigan let it be known that:
"The field of citation counting isn't a 'field' in any intellectual sense. It's a shortcut; an attempt to evade engagement with intellectual content and reduce everything to the logic of a spreadsheet."
 "I don't doubt that some disciplines might benefit from citation counting. But I'm sick of scientists imposing their methods onto non-cognate disciplines and demanding that everyone else fall into line."
Several recent articles further explore book and even chapter-level impact using sources other than BCI. "Assessing the citation impact of books: the role of Google Books, Google Scholar, and Scopus", published in November 2011, examines whether these databases can provide "alternative sources of citation evidence", and specifically looks at references to and from books. Planned data mining of the Hathi Trust corpus may open up some new avenues. A 2006 account of a pilot project for the Australian Council for the Humanities, Arts, and Social Sciences tests the extension of citation analysis to books in history and political science:

Source:Linda Butler, Council for the Humanities, Arts, & Social Sciences
 
We'll follow up on these and other recent works on "bibliometrics" in a subsequent post. (Mark your calendars for that!) For now, let's assume that book impact factors are worth some consideration in decisions about storage, withdrawal, and retention.

As monographs are considered for deselection, there is often a desire to exempt titles that appear on "authoritative" lists or core lists, regardless of whether those titles have been used. Examples include titles listed in Resources for College Libraries or as CHOICE Outstanding Academic Titles, or on discipline-specific accreditation lists. Clearly, titles listed in the Book Citation Index could fall into this category, and might be considered candidates for retention irrespective of other considerations, even as the debate about citation analysis continues.

There is one very practical problem, however. Book Citation Index, as currently constituted, is limited to books with copyright dates in the current year plus 5 previous years in the Sciences, and current year plus 7 previous years in Social Sciences and Humanities. As this is written in early 2012, then, coverage includes:
  • Sciences: books published in 2007 or later
  • Social Sciences & Humanities: books published in 2005 or later
To date, deselection criteria in the projects supported by our firm Sustainable Collection Services have focused on titles published or acquired before 2005--sometimes much earlier. The universe of titles being considered for withdrawal and the universe of most-cited titles in Book Citation Index at present do not overlap at all. For now, impact factor simply cannot play a role in deselection decisions. The relevant data does not yet exist in any consolidated form.

As the list of titles grows over time, it will become more relevant. But the role of book impact factor in deselection will emerge only as titles published in 2005 and later begin to appear on withdrawal candidate lists. The utility of the impact factor will grow incrementally; under the Book Citation Index model, 10,000 additional titles will be available for analysis each year. In five or ten years, this may be an important data point. But not quite yet. In fact, it may not be necessary at all, since presumably highly-cited books would tend to receive more use. And in deselection decisions, use trumps most other considerations.

Wednesday, January 25, 2012

Browsing Now (2)

Browsing and serendipity are not limited to the book stacks. Skimming and scanning are habits of mind, and can lead to unexpected discoveries anywhere. Like millions of other people, I use Twitter to bring a mix of relevant and entertaining content to my attention. While Twitter's brief messages and links rarely include books, they do provide a loosely-shaped browsing experience that often leads to useful information I might not find otherwise.

On January 12th, 2012, a small snapshot of my Twitter feed included the following.
@lorcanD: "NYT Windows phone app is very nice, while the Guardian's is lazy."
@ChuckProphet [musician]: "Sometimes Christians are so mean."
@GreatDismal [author]: "Signed Hungarian completist's amazing collection of my work in Hungarian. Many rarities I hadn't seen before."
@GreatDismal: "Sorry I wasn't tooled up to sign tablets." (A problem later rectified--above).
@latimes: "James Joyce moves into the public domain, mostly." [link]
@lorcanD: "the future of collections and collections management. interesting pres by Caroline Brazier of BL. ppt" [link to powerpoint]
In less than a minute, I gleaned several unexpected thoughts (autographing books is changing), developments ("Joyce's unpublished work, particularly his letters, will [now] be available to scholars"), questions (there are people who use Windows phones?) and two substantive links, without actively searching for any of them. Echoing David Weinberger's characterization of the web, these were "small pieces, loosely joined."

So browsing itself continues to advance and morph, as do the formats and content found. Pictures, news blogs, opinions, observations by interesting artists, even an occasional book. But the prize of this day's group of links proved to be the slides from Caroline Brazier's presentation on "Collect/connect: the future of collections and collections management." It's a substantive exposition of the changes facing libraries, prepared to inform the British Library's 10-year strategy. But it also showcases some additional attributes of professional information and grey literature in the early 21st century. Ms. Brazier's work is:

  • Timely. Delivered on October 27, 2011 in Adelaide, Australia.
  • Authoritative. Authored by the Director of Scholarship & Collections at the British Library.
  • Linked from a trusted source. Lorcan Dempsey's tweets regularly turn up interesting targets, indicating context and format.
  • Topical. Future of tangible and digital collections, curating a discovery layer.
  • Graphical. 39 slides with minimal text. Images, tables, and graphs drive the message. (Grey literature a true misnomer here!)
  • Freely available. Like a library. Amazing how much valuable content fits this description.
  • Multi-media. A full MP3 audio is available to accompany the slides.
  • Easy to share. Links, extracts, copies.
  • Useful. A powerful new graphic for thinking about shared print.

A browsing nugget, January 12, 2012
  • Discovered serendipitously. Scrolling through dozens of unrelated entries, far from the book stacks, far from the library.

Tuesday, January 17, 2012

Browsing Now

Browsing in the stacks
At some point in almost any discussion of weeding, storage, or shared print, concern about browsing surfaces. If large portions of the print collection are withdrawn, moved to offsite storage, or are held only in other libraries, in-stacks browsing will be disrupted. A rich possibility for serendipitous discovery will be eliminated. If books are not co-located, opportunities and connections may be missed. Scholars in the humanities, in particular, regard the library collection as their "laboratory", and are often vocal in their opposition to any plan that requires physical relocation of books. A good example of these strong feelings comes from an article entitled "In Face of Professors' 'Fury', Syracuse U. Library Will Keep Books On Shelves," which appeared in the November 12, 2009 issue of the Chronicle for Higher Education:
"The reaction was so fierce because of the high value humanities researchers still place on hands-on browsing, Mr. Watts said. "The big issue in the letters and among humanists generally is the importance of being able to browse collections and not have them in a remote location."

Begin here, circa 1975
Personally, I can't count the number of times I've benefited from unexpected proximity. Serendipitous discovery can be both surprising and satisfying. While a student at the University of New Hampshire in the 1970s, I would copy down the call number of a title I needed, and head to the stacks. Once in the designated area, my reverse introduction into the mysteries of LC classification and Cuttering began. More than once, I found something more interesting or relevant in the surrounding volumes than what I originally sought. This was especially helpful when my target item was not actually on the shelf. The loose and at times arbitrary-seeming aggregation of books by subject presented me with options I might not have discovered without physical browsing, especially in the days of card catalogs. 

But I sometimes found myself impeded and mystified by the same cataloging practices that rewarded my browsing. Why were Thomas Merton's 'Raids on the Unspeakable' and 'My Argument with the Gestapo' lodged on the 5th floor in PS 3525 while his 'New Seeds of Contemplation' and 'Asian Journal' resided on the 4th floor in BX 2350? I put it down to the perils of being a monk and a poet simultaneously. I did occasionally wonder about his accidental shelf neighbors in both realms, like Robert Merton or Henri Nouwen. This system certainly fostered serendipity and even curiosity, but not entirely as the result of , ah, intelligent design.

The insistence on maintaining browsable physical collections in open stacks overstates the value of what has always been a partial and complementary research strategy. The largest and best libraries do not hold every work on a subject; access to other collections is needed, and that access relies on bibliographic data rather than direct examination. Even in a single library, books and journals on the same topic are typically separated, with unified discovery supported through bibliographies, indexes, and databases. And physical browsing is actually a very recent phenomenon. Closed stacks were a mainstay of academic libraries until the middle of the 20th century. As Donald A. Barclay, Deputy University Librarian at the University of California, Merced, noted in an American Libraries blog entry, entitled "The Myth of Browsing:" 
Prior to the Second World War, the typical academic library was neither designed nor managed to support the browsing of collections. At best, faculty might be allowed to browse, but it was the rare academic library that allowed undergraduates into the stacks.
And of course browsing and unexpected discovery are not limited to the book stacks. In double-checking my memories from the 1970s, today's UNH online catalog provided search results sortable by call number,  enabling an instant virtual scan of the the shelves around my Thomas Merton titles. Those shelves are 50 miles from where I sit. In many respects, the ability to browse online is superior to in-stacks browsing. I could follow author and subject links, range beyond the UNH catalog to consult the combined collections of the Boston Library Consortium orWorldCat, scan tables of contents, and in general follow trails that would not have been open to me while walking the aisles. This sort of browsing in the cloud is not the same as browsing in the stacks, but does offer many of the same advantages, some new ones, and a good deal more convenience. (They don't call Firefox a browser for nothing.)

Browsing in the cloud
The current tools for online browsing are far from perfect. Online browsing is only one component of a research strategy. But it may be (or may be developed as) a reasonable stand-in for in-stacks browsing, especially as libraries add more and more eBook titles. As economic pressures and user preference for electronic content grow, libraries must reconsider the value of onsite print collections. In most cases, this will mean smaller collections in central stacks, and remote access to lesser-used materials. While this may limit physical browsing, it does not preclude serendipity or scanning for connections. These result from the openness and imagination of users, and will persist regardless of how information is stored. Rather than insisting on all books remaining in situ, we may be better served by adapting our browsing proclivities to new tools.

Monday, January 2, 2012

Meet the Press

The act of removing books from library shelves carries a shocking load of emotional and cultural baggage. Academic libraries are seen as guardians of the published record, and more specifically the printed record. To most library users (and to a good share of librarians), books are the DNA of a beloved institution--the core of its identity. At some fundamental level, deselection appears to betray that identity and its corresponding cultural commitment.

Image and article  from Cracked.com

But while the printed book remains a vital part of scholarly and cultural communication, it is painfully obvious that its primacy has been eclipsed by electronic resources. Use of books is far lower than most people realize. Millions of redundant volumes are sitting untouched in open stacks and storage facilities. They occupy space and time that could be used for other purposes. As a professional community, we have a responsibility to address this, even in the face of misunderstanding and resistance.

But we also need to explain our thinking and our actions. We need to educate faculty, students, administrators, and even library staff about deselection. As with any difficult topic, effective communication about weeding, storage, and shared print decisions requires not only mastery of the relevant data, but direct engagement with concerns and objections. We need to manage the message as actively as we manage the decision-making and logistics. We need to provide context. And we need to take the initiative.

The temptation to hide or downplay deselection projects can be quite strong, since this activity is often perceived so negatively. Deselection can attract close scrutiny, inflammatory language, television cameras, and protests. Stories and misunderstandings abound, but let's start with a big one that occurred 15 years ago.

In the October 14, 1996 issue of The New Yorker, author Nicholson Baker confronted San Francisco Public Library about books removed after retrospective conversion of the library's catalog. The article, entitled "The Author Vs. The Library" [abstract], set an adversarial tone and engendered suspicion around motives and practices for removing library material--suspicion that persists to this day. Some of that suspicion is warranted. We need watchdogs. We also need to be clear about why some books must be withdrawn. We need to assure that criteria for weeding are well-founded, and that the process is sufficiently transparent. We need to articulate the problem, the solution, and the benefits.

Baker attacked when he learned that 200,000 books had been sent to landfills. There were unusual aspects to this case, and clearly some questionable decisions made in haste. But even Baker's informants from the library agreed that "maybe a quarter of them, fifty thousand, should have been thrown out." And while the article teems with examples of potentially valuable discarded titles, those lists provide no context. No consideration is given to circulation rates, to how many other libraries hold these works, to a title's relevance to the Library's mission, or to the cost of retention. (Digital versions were not a significant factor in 1996.) In short, the picture as sketched by Baker was incomplete.

It is our task as responsible librarians to fill in the background and to complete that picture. In some cases, even good books (or rather, surplus copies of good books) may warrant weeding and discard. We need to get better at making the case and showing that no content will be lost. We need to welcome the press and make our decisions/criteria transparent. In subsequent posts, we will look at some libraries that have handled these aspects of deselection especially well. But for today, let's recall some good practical advice from an academic library that pursued a major weeding and storage project between 1995 and 2002. Virginia Tech must have been all too aware of Nicholson Baker's piece as they removed 100,000 volumes each year between 1996 and 1998.

In the May 2005 issue of The Journal of Academic Librarianship, Paul Metz and Caryl Gray looked back on that experience, and offered their "Perspectives on... Public Relations and Library Weeding" [WorldCat record]. This article is an excellent short synthesis of what the Library learned about communicating with its community before, during, and after its massive deselection project. Metz and Gray's advice focuses on six key topics:
  • Advance Publicity: Consulting faculty 'early and often' is critical. For example, "An article published in the Spectrum, a faculty/staff newsletter, presented an overview of the shelf-load problem, outlined the strategies that would be employed, and invited academic departments to participate in the process."
  • Clear Criteria: "The criteria and guidelines for the project were shared with interested faculty members and an opportunity to review items selected for discard was offered."
  • Flexibility: "The Libraries expressed its willingness to transfer storage items back to campus" [...] "A library's willingness to allow visitors to come to the Storage building and examine items on site is another important and beneficial element in flexibility."
  • Good Deeds: "...[the English bibliographer] listed discards she thought would be of interest and offered them for transfer to the department. By the end of the project 4016 books had been sent to several members of the department."
  • Quick Response from the Top: "The [Director of Collection Development] answered all questions immediately. [...] With many critics it helped to give tangible examples..." [...] "Who would want a book on personal finance that predates the personal computer and spreadsheets?"
  • Low Visibility: "...it was the Library's commitment to recycling that invited its greatest public relations challenge. Rather than send our discards to a landfill, we held them in a dumpster behind the library. The unfortunate result was that "dumpster divers" would periodically remove items and express their disapproval of the Library's decisions." [...] "After several months, the dean of Libraries decided that the 'dumpster situation' was untenable and directed that all library discards be included in the university's periodic surplus property sale." [Related post]
As is clear from these brief statements, even the best communications plans require extensive groundwork and vigilance. Despite all the preparation, a letter entitled "Tech Is Treating Unwanted Books Like Garbage" appeared in the Roanoke Times and World News. But because they were prepared, the Director of Collection Management responded within 4 days, providing corrections and context, and advancing the Library's rationale. In the end, Virginia Tech considers their project "a success. The Libraries still benefit from the enormous gains in shelf space. [...] The project is very rarely mentioned by faculty or other members of the university and seems almost to have been forgotten."

Wednesday, November 9, 2011

Deselection and 'Remedial' Discovery

Last Wednesday in Charleston, my colleagues Bob Kieft (Occidental College) and Sam Demas ('freelance librarian') and I facilitated a day-long preconference on 'Shared Print Archiving: Building the Collective Collection." We benefited from a full roster of speakers experienced in management of shared print collections: Lizanne Payne (CRL, WEST, and CIC); Emily Stambaugh (California Digital Library), John MacDonald (Claremont Colleges), Kathryn Harnish (OCLC), Michael Levine-Clark (Univ of Denver), Judith Russell (Univ of Florida, ASERL), Rachel Frick (CLIR/DLF), and Doug Way (Grand Valley State University).

The hordes gather to discuss shared print archiving

 Because we employed lightning talks and approached the topic broadly, discussions were more exploratory than definitive. As befits a discipline in the throes of formation, a certain element of confusion and chaos attended the day. But my overall sense is that a group of smart, experienced people worked hard to share information about disparate efforts, and to integrate them into a regional and even national conversation. More work, more focus, and still broader participation are needed, but we built on good work already underway.

One strand of conversation very much surprised me. The discussion around deselection and drawdown of duplicative print collections repeatedly turned toward discovery and digital content. This seems ironic in some respects, as shared print initiatives tend to focus first on titles that have never circulated. Why would we be concerned about the discoverability of content that has remained untouched for 10-20 years? On the surface, deselection and discovery seem like mutually exclusive categories. Is there really a need to enhance discovery of something that is being withdrawn or moved to storage for lack of use?

 Well, perhaps there is. As a group, we surfaced several arguments in favor of what I'll call 'remedial discovery.' 
  • Self-fulfilling prophecy: One reason that titles don't circulate maybe because they are not found. Users are not always skilled searchers, and even the best cataloging records have a limited number of access points. Cataloging errors may also play a role here (e.g., a misspelling in a primary access field).
  •  Shared print collections limit physical browsing: Paradoxically, the decision to rely on copies that are not held locally in open stacks increases the desire for some form of virtual browsing or enhanced discovery. A user may want to know more about a book before requesting it from another library or from a remote storage facility. The further away the books are, the more desirable virtual browsing appears.
  • Record enhancements have not been universally applied: Many OPACs have taken a page from Amazon to include cover scans, flap copy, and tables of contents. But these enhancements have not been adopted for all libraries, and may not even be available for older titles--those most likely to surface as withdrawal or storage candidates. In some cases, older titles may not have had the same level of exposure as newer titles.
  • Discovery layers are just coming into their own: Discovery tools have improved dramatically in the past few years. The more content that is indexed in those tools, the better the chances a user will find resources that may have been overlooked in the past. Here again, older materials have not benefited from these newer techniques. Perhaps they need another chance, with better tools.
Search: 'you are a faithless mad son of clocks and buzzers'
  • Full-text indexing could stimulate use of older titles: Already the Google Books, Internet Archive, and Hathi Trust interfaces have radically improved the chances of finding older titles -- perhaps older titles that have previously been little used. 

It's an interesting take, and perhaps worth some experimentation. We've come at this topic from other angles previously, in posts on 'patron-driven re-acquisition' and 'curating a discovery environment.' All of this needs to be thought through more carefully, but maybe we ought to consider two simultaneous courses of action once unused titles have been identified.
  1. Continue to draw down highly-redundant print collections in the context of shared print archiving and secure digital collections.
  2. Enhance the remaining records for optimum discoverability. Give them a second chance to benefit from newer discovery tools.
The second of these is somewhat counter-intutitve, since it involves additional investment in a resource that has already cost far more than it has yielded. Some titles may not benefit from the additional work. But it may be worth testing on a small scale. Not only would it level the playing field for older titles, it would provide additional convenience to users examining content remotely. Specific enhancements might include:
  • Add the Hathi Trust public domain URLs (where available) to catalog records for low-use titles
  • Add Tables of Contents to catalog records for all eligible withdrawal candidates
  • Add cover scans, flap copies, and links to reviews for these older, low-use titles
  • Add eBook PDA records for withdrawal or storage candidates
  • Devise a virtual browse function, similar to the Hathi page turner or Amazon 'Look Inside the Book'
No doubt there are other ideas. Some of them will require a good deal of work and investment. There are definitely some trade-offs here, and perhaps the approach must be selective to be affordable. But it's intriguing to think about creating better forms of discovery and access for material that is going offsite or will be held by another library. Lack of browsability is one of faculty's main objections to removing print from central campus stacks. Connecting deselection and enhanced discovery may be one way to answer that.

Tuesday, November 1, 2011

The 'Disapproval Plan' Revisited



In the December 2008 issue of Against the Grain, I introduced a new concept: "The Disapproval Plan: Rules-Based Weeding and Storage Decisions" [pdf]. The article's title was only somewhat tongue-in-cheek. As I tried to demonstrate, selection and deselection represent the same function, performed at different points in a book's lifecycle. At both points, titles are accepted or rejected. Approval plan profiles assure consistent and customizable treatment of newly published titles. Disapproval plan profiles assure consistent and customizable treatment of older titles that have not circulated much. The goal, for most libraries, is to create--and maintain-- an active collection, relevant to the current and future needs of its users.

Approval plans support content acquisition decisions; disapproval plans support storage, weeding, and shared print decisions. Both approval plans and disapproval plans safeguard collection integrity while providing an efficient, reliable alternative to title-by-title scrutiny. Both approval plans and disapproval plans have limitations; both are most appropriately deployed to handle mainstream titles. Subject experts, whether librarians or faculty, remain essential for specialized materials and judgment calls, but rules-based or profile-driven approaches can relieve them of the need to make many obvious and repetitious decisions.

With the benefit of 3 more years of thinking about collection use, deselection, and shared management of print monograph collections, it's become clear that my initial sketch of the disapproval plan concept can be drastically simplified and refined. The original concept focused on commercial alternatives to locally-owned print, such as Google Books, eBook aggregators, print-on-demand, and the used book market. In retrospect, this approach over-emphasized 're-obtainability' and understated the fundamental importance of archival commitments and operating in the context of the 'collective collection.'

Since then, the emergence of the Hathi Trust digital archive (now containing 5.1 million full-text book titles), and shared print archiving initiatives such as those developed by WEST, ASERL, ReCAP, CRL and the CIC have changed the picture, creating new safeguards and expanding deselection options. The work of Constance Malpas, Lizanne Payne, Paul Courant, OhioLINK/OCLC and others has provided new data and insight on print monographs overlap, storage capacity, and costs.

One key fact has not changed, though. The need remains for an automated tool that assembles relevant deselection metadata, and develops rules to operate against that metadata: a deselection decision-support tool. Now is the time to adapt the disapproval plan to new realities, and to incorporate both archival values and service values into the model. The November 2011 release looks like this:

A 'disapproval plan' is a set of library-defined rules that must accomplish four tasks:

Define the deselection universe. What pool of titles is eligible for deselection consideration? Data elements and distinctions might include:
  • Low-use or no-use titles: These can be identified from circulation, direct borrowing, in-house uses (if captured), and ILL data.
  • Titles owned more than x years: Titles should be given a chance to circulate. Most libraries won't consider withdrawing a title owned for less than 5-10 years. Publication date provides a rough approximation, but leaves older imprints that are recently purchased in the pool. Acquisition date or accession date are much more reliable.
  • Titles widely held elsewhere (see below).
  • Titles that will be kept regardless of use: Works by faculty authors, notable alumni, Nobel Prize winners or are cited in authoritative bibliographies may need to be exempted.
  • Specific editions or translations: A conservative approach would suggest that matching be conducted with FRBR work families turned off. This retrieves only those holdings that reflect the edition in hand.
Assure that withdrawal candidates remain secure.  Once the eligible low-use universe has been defined, the archival security of this content must be gauged. An individual library operating within the academic community must help assure that nothing is lost. While it may not be necessary that a title be held locally, we must satisfy ourselves that it has been secured somewhere. Deselection metadata and rules in support of collection integrity might include:
  • Presence of a print copy safely archived in a trusted repository
  • Presence in Hathi Trust digital archive 
  • Presence in Hathi Trust print archive [under development]
  • Explicit retention commitment for 4-6 copies nationally [MARC 583]
  • Number and distribution of other print holdings nationally, globally, or regionally (shared print archiving)
Assure that withdrawal candidates remain accessible.  Once archiving has been assured in both print and digital form, accessibility comes to the fore. In general, archival copies should not leave the facility where they are secured. Instead, 'service copies' are needed. Deselection profiles need to incorporate this factor, which identifies where usable copies of the content exist and in what form. Here the salient data points are:
  • Presence of a service copy in a regional service center
  • Availability of alternate editions; i.e., same content, different vehicle
  • Availability of commercial eBook versions/PDA records
  • Availability of a print on demand edition
  • Re-purchasable on the used book market 

Enable data-driven decisions to store, withdraw, or retain/curate. As approval profiles generate books, notifications, or exclusions when the profile is applied, the disapproval plan can support storage, withdrawal or retention decisions, depending on local needs. Low-use titles that are scarcely held elsewhere might become candidates for preservation or digitization, enabling any library to contribute to the collective collection.

    As with selection, deselection work can be done by the library without outside assistance. Reports generated from the ILS or open-source collection evaluation tools such as the GIST Gifts & Deselection Manager can be employed. But we suggest there is also room for a vendor-assisted model for deselection and disapproval plans, which is why we founded Sustainable Collection Services (SCS). SCS offers a full-service model for deselection, much like approval vendors do for selection. SCS advises on data extracts; normalizes and validates library-supplied bibliographic, circulation, item, and holdings data; and compares low-use titles to WorldCat, Hathi Trust, and other external sources. 

    We now offer in a mediated form the ability to interact with preliminary results and model different combinations of factors: use, time in collection, consortial partner print holdings, existence of Hathi Trust digital version, number of WorldCat holdings. FRBR on, FRBR off. Our tools support full-library analysis or focus on specific subjects or locations. Early in 2012, SCS plans to release a Web version of our service that will enable unmediated interaction with deselection metadata. The disapproval plan may yet live and thrive!

    A final note: 'disapproval' has negative connotations, but selectivity always implies a mix of acceptance and rejection. Approval profiles always drive as much rejection as acceptance. For English-language books, a large approval plan might supply 20,000 new titles out of 60,000 candidates. The remaining 40,000 new books have in effect been 'disapproved.'

    Lack of use over time constitutes disapproval of (or at least indifference toward) that title by patrons. They have chosen other books (or more likely, other [electronic] resources) to use instead. Disapproval in the form of withdrawal or removal to storage by the library simply reflects the preferences of users, and the security and accessibility of the content elsewhere.

    Can we 'neutralize' this term by basing disapproval strictly on data? Probably not. Too bad, because it's a very useful term. These are not bad books. Deselection does not connote disapproval of their content. They are simply not relevant or no longer relevant to a specific user community. Selectors face the same decision when first choosing which titles will enter the collection--and with far less data at their disposal. At deselection, there is a track record.