Sample & Hold: Rick Lugg's Blog

Tuesday, March 29, 2011

The Cost of Deselection (4): Data Comparisons

The time and effort required to acquire and use title-level deselection metadata depend on several factors:

What does the library want to know about withdrawal candidates, in addition to the fact that they have not circulated in x years? What deselection metadata is needed?
What match points are available in both the library's data and the comparator target?
To what degree can comparisons be batched and automated?

To look more closely at this, let's consider a recent project with Grand Valley State University, one of our valued library partners. GVSU's circulating monographs collection includes just over 300,000 titles. At the highest level of analysis, they were interested in creating withdrawal candidate lists based on these criteria:

No circulations since 1998
Publication year before 2000
More than 100 US Holdings
More than 10 In-State Holdings (Michigan)
Not listed in Resources for College Libraries
Never reviewed in CHOICE

GVSU was also interested in titles that might be important to retain or preserve as a contribution to the collective collection. Characteristics of these titles include:

Fewer than 10 US holdings
Not represented in Hathi Trust
No circulations since 1998
Publication year before 2000

The cost of developing these criteria has already been considered in the fixed costs portion of our conceptual model. But the actual data comparison incurs additional costs, some of which are variable, growing with the number of titles involved. To generate statistics based on GVSU's criteria, it was first necessary to compare 300,000 titles against several external sources, including:

OCLC WorldCat, using the WorldCat API (220 million records)
Hathi Trust (4.6 million book titles)
CHOICE (156,000 titles)
Resources for College Libraries (60,000 titles)

A couple of cost factors come into play here. First, some of the comparator targets require the purchase of a subscription. If these subscriptions are already in place for another purpose, it seems reasonable to omit them from deselection costs. But if subscriptions are added directly in support of deselection, the cost should be included.

The bigger cost, however, is embedded in the work of executing those millions of comparisons and capturing the results. One option, of course, is to to search each title manually, but at this volume that is really impractical, and ultimately would cost an enormous amount in staff time. Remember that all 300,000 items had to be searched in four separate places (now three, since Hathi holdings are represented in WorldCat). It is clearly preferable to match via batch processes, unless the target collection is very small.

A second alternative is to work from a report generated by the library's ILS. Some systems can produce these routinely, but others require local creation of queries or scripts. Even with a good batch of bibliographic data, there are not always mechanisms for matching conveniently against multiple comparator files--these processes must be devised. To interact with the resulting data effectively, it may also be necessary to store results, so that criteria can be modified and new results produced iteratively. At a reasonable scale, much of this can be handled by Excel or Access or their open-source counterparts. But all of this may require a substantial amount of skilled staff time--with the clock running at a minimum of $35/hour.

Even batch-matching costs can vary, depending on what match points are available in both the candidate list and the target data. If the library has OCLC numbers in most records, this speeds matching with WorldCat and Hathi. If the library does not retain OCLC numbers in its local records, LCCN and ISBN matches are possible, but create more work and more exceptions. And some comparator targets, such as CHOICE and RCL, don't include OCLC numbers; different matching routines may be required for a single library.

Yet another approach, with its own associated cost, is to use a third-party tool or service such as WorldCat Collection Analysis, Library Dynamics, the GIST GDM, or the one offered by SCS. In various ways, these services allow the library to outsource some portion of the data comparison work--once the necessary data has been made available to the vendor. This has some advantages, but also some costs. As an example of completed data comparison, this SCS Collection Summary shows the preliminary results from GVSU:

In this instance, the combined effect of their deselection criteria resulted in 53,000 withdrawal candidates and 382 preservation candidates. The effect of individual data comparisons is reflected in the bottom section of the chart. The exact cost of completing this sort of analysis remains to be fully understood, as we are just beginning this work. But it is important to see all costs related to batch processes and tools in context. Any project involving more than 5,000 titles will be very time-consuming to do manually. It it also important to remember that we are still gathering information--and that several more steps remain to get the books off the shelves. Next up: selector review and staging.

Links to related posts:

Tuesday, March 22, 2011

The Cost of Deselection (3): Wage Rates

The further I delve into a cost model for deselection, the more multi-faceted it becomes. There will be several more posts in the coming days/weeks. My plan is to use this blog to think through all facets, one step at a time. Once the salient issues have been parsed, I hope to synthesize all of the pieces into a simple, coherent model. Hope is that thing with feathers, right?

First, let's get organized about hourly staff costs. The basis of the following numbers: that old reference chestnut The Occupational Outlook Handbook, produced by the Bureau of Labor Statistics. The following numbers are drawn from the 2010-2011 edition, but they are based on 2008 data. Given the economic situation since then, it seems unlikely the rates would have changed much. In the categories of Librarians, Library Technicians, and Library Assistants, I've used the segment entitled "Colleges, universities, and professional schools" -- this is an academic library cost model. Similar numbers are available for public and school libraries, but for now they are mercifully beyond our scope here.

As noted in yesterday's post, I've converted the annual Librarian salaries to an hourly rate, assuming a 40-hour work week. In all three categories, I've added 30% to the hourly rate to account for benefits, which BLS estimates at 29.2% of salary on average. For student workers, I've assumed that most libraries pay minimum wage ($7.25/hour) and that no benefits are involved; I've rounded up to $8 to account for higher wages paid to longer-serving students.

POSITION               HOURLY WAGE        HOURLY COST(w/ benefits)

Librarian        $ 26.52 $ 34.48

Library Technician     $ 15.91                         $ 20.68

Library Assistant        $ 12.92                         $ 16.80

Student Workers        $ 8.00                          $ 8.00

These are median rates; i.e., the middle value in the distribution of salaries in these categories. This seems a reasonable level at which to build a conceptual model. Clearly wages will vary by region, and to some degree by specialty and longevity. Any application of the model could be modified to account for known variances. The main purpose here, however, is to identify the components of the process, estimate the time invested by each level of staff, and to establish a method for estimating overall costs and costs per volume.

Comments are welcome. Do these seem like reasonable rates on which to construct the cost model?

Links to related posts:

Sunday, March 20, 2011

The Cost of Deselection (2): Fixed Costs

Any comprehensive costing model for deselection must consider the process from inception to completion. Some components will be fixed costs (largely independent of the volume of titles under review), and others will be variable, increasing or decreasing in proportion to the number of volumes actually handled. Fixed costs may be one-time (if deselection is conceived as a project), or recurring (if deselection is conceived as an ongoing activity).

For the sake of clarity, let's start with the example of a one-time project to remove 20,000 volumes. Today's post will consider some examples of fixed costs:

Project design and management
Data assessment and extract
Development of criteria for candidate lists
Communication with stakeholders

Project design and management: Every deselection project starts somewhere. Sometimes Stacks Management can no longer shelve new titles in a particular range. Sometimes a selector recognizes how old and little-used a segment of titles are. Sometimes an administrator wants to repurpose space. The genesis and scale of the project will determine its objectives, and will suggest who needs to be involved.

For a project involving 20,000 volumes--a figure equivalent to that annual intake of many libraries--a formal plan will be needed. Meetings will be required among affected selectors, stacks management staff, technical services staff, supervisors of student or temporary workers, etc.. A communication strategy will need to be developed. Liaison with Facilities may be necessary for removal of deselected volumes. Decisions will need to be made regarding criteria for deselection, record maintenance, timing, and disposition options.

If a group of six people met for two hours once a week for a month to plan the project (a very conservative estimate, in my experience), 48 hours of librarian and staff time would be absorbed in planning. It seems reasonable that the group would continue to meet regularly throughout the course of the project, so we might add another 48 hours for one meeting a month over the next six months.
Estimated time for project design & management: 100 hours

Data Assessment and Extract: Every library system stores circulation data differently. Individual libraries may count and group circulation statistics differently. Some libraries count in-house use (some as charges in the system, some entirely separately). The extent of circulation data retained by individual libraries typically dates back to the most recent ILS migration. Obviously, analysis can only be done within the constraints of available data.

The accuracy of available data depends on how regularly inventories and shelf-reading have been done. In some cases, holdings data has not been kept current in WorldCat. Each library must decide its tolerance for dealing with less-than-ideal data. In some cases, remediation work may be necessary, but more often that remediation will occur as a welcome side-effect of the deselection process. Item-level data and acquisitions data may be important. For example, "date acquired" is an essential modifier of circulation data--we don't want to deselect titles that have only recently been added. Item records typically contain location, barcode numbers and other elements such as donor information. It is useful to understand what match points may be available for comparison to external data sources (e.g., OCLC number). The perspectives of a Systems Librarian and Technical Services are both important here, to ascertain what data resides where.

Once the project team understands the characteristics of the bib, item, and acq data available for analysis, it needs to be pulled from the ILS in the form of reports or extracts. Again, every ILS provides different approaches to this, ranging from template-based collection management reports to customized SQL queries directly against the library's database tables. Some approaches require more time and expertise than others, particularly if subject or location specificity is wanted, and on how fully the bib and item data are integrated. The extracted file of no/low-use items then needs to be formatted for review and for comparison to external data sources.
Estimated time for data assessment and extract: 20 hours (assumes no problems)

Development of criteria for candidate lists: In a benign dictatorship, this step would require no time at all. But that's not how most libraries operate. First, what date range applies to circulation; e.g., are we looking at titles that have not circulated in the past ten years? the past five years? Are we considering titles that circulated once during that period? What publication dates will we consider? Do we need to factor in the date added to the collection as well as imprint date? Are some subject areas off-limits?

If a non-circulating title appears in Resources for College Libraries or is a CHOICE Outstanding Academic Title, does that change our opinion? What number of WorldCat holdings should we use for a first pass--100 in the US? 50 in the US? 10 in our state? Which consortial partners or peer libraries ought to be considered? Are we interested in whether a title appears in Hathi Trust?

For those few libraries that have active deselection programs, many of these questions may already have been worked out. But most libraries will need to spend some time thinking through the collection and access issues here before taking action on a substantial number of titles. Meetings, draft policies, more meetings, and revisions will be necessary.
Estimated time for development of deselection criteria: 100 hours

Communication with Stakeholders: Few library initiatives attract more attention than removing books from the shelves. To avoid misunderstandings and negative consequences, it is critical that the library deliberately shape its message about drawing down the print collection, and make certain that message is widely communicated and understood. This takes time and patience, and especially during early projects will add to the costs associated with deselection. Articles and blogs need to be written laying out the rationale, and highlighting safeguards. Presentations must be developed and delivered to the faculty at large, to individual departments, and even to students. It may be necessary to make the case to colleagues within the library as well. While the deselection message can be incorporated into liaison programs and other routine channels of communication, in most libraries a more focused effort will be needed.
Estimated time for communication with stakeholders: 100 hours

The total for these four areas is 320 hours of librarian and administrator time. This is deliberately an extremely conservative estimate -- and comments are most welcome on this point -- but its primary purpose is illustrative. According to the Bureau of Labor Statistics Occupational Outlook Handbook 2010-2011, the median 2008 salary for librarians in colleges, universities, and professional schools is $55,180. Assuming a 40-hour weeks, this equates to $26.52 per hour. If we add the BLS figure of 30% for benefits, the median hourly rate for a librarian is $34.48. The activities outlined above impose an estimated cost of $11,034. (320 hours x $34.48=$11,034). Even this modest total equates to $.55/volume for the 20,000 volumes in our example.

At this point, the library does have a project plan, a handle on the data, a communication strategy, and an early message out to the community. But this is only the beginning. The books are still on the shelves. Comparisons with external data sources have not begun. No deselection decisions have been made. No bib or item records have been touched. Other costs will follow, and I'll continue tracing those in the next post.

Links to related posts:

Friday, March 4, 2011

The Cost of Deselection (1)

Recently there has been a lot of interest in the question of how much it costs to remove a book from the library. Judging from the listserv traffic, no one yet has a complete answer. A couple of relevant comments from librarians who have captured a portion of the cost:

From Steve Bosch at the University of Arizona:

We have measured the amount of time required to withdraw materials. Since we have removed over 350,000 vols the number seems to hold up for our planning purposes. This is an average for all types of withdraws from serials with multiple items to books. Over the years the avg has been about 6 min per title. This includes pulling the item from the stacks – updating local records, updating OCLC holdings, and the physical processing of the de-accessioned item.

From Martha Hruska at UC/San Diego:

When we can do large batch processing of withdrawals, our Facilities manager estimates that the pulling, stamping, boxing, staging, and transporting to Surplus Sales costs approx. 25 cents per volume. When we need to pull up records one by one using student help, the price doubles to 50 cents per volume (on average).

These are both useful indicators, but of course represent only a portion of the overall cost of deselection. They also measure things differently:

U of A: 6 minutes per title (or 10 titles per hour)- does not appear to include transport.

UCSD: 25-50 cents per volume (including transport).

These estimates are actually quite far apart. If we assume $10/hour for labor, U of A's cost per volume would be $1.00. If we assume the same hourly rate for UCSD (with students pulling up records one by one), they could process 20 titles/hour--more than double U of A's rate--for the same cost. Under the UCSD batch process, they process 40 titles/hour. And the UCSD process appears to include more steps. There is clearly much more to learn here.

These back-of-the-envelope calculations point to a clear need for a comprehensive cost model for deselection, which would need to include the intellectual work (data gathering and deselection decision-making), as well as normalizing the physical handling and record maintenance tasks captured in the examples above:

Identification of no/low-circulation titles
Determining/negotiating parameters for deselection (imprint date ranges, title protection rules)
Identification of holdings by consortial partners or WorldCat
Staging titles for physical review, condition comparison
Selector time/Faculty time spent in review of lists and physical items
Error correction--books that don't match records, etc.

It is only after these steps are performed that the cost models described above come into play, and even here there is more than one variable in each step:

Physical Handling

Pulling from shelves or staging area
Library stacks vs various storage facilities

Record Maintenance

Local bib and item record updates
OCLC holdings updates
Insertion of URL to digital version

Disposition Options

Packing and staging
Shipping
Recycling
Selling/Donating

Not every library will incorporate all of these steps, and of course that is the point. Library practice varies widely here, and so do costs.Some libraries have good batch update capabilities; others less so. Disposition options range from boxing for sale or transport, to dropping items into a recycling bin. Handling costs depend on which option the library chooses. Some libraries support staging and physical review; others work from lists. Pulling from stacks or open storage shelves is much less labor-intensive than pulling from bins in a high-density storage facility.

In short, deselection policies, workflow designs, and systems capabilities dramatically influence transaction costs for deselection. In subsequent posts, I will outline a conceptual model for estimating deselection costs that can accommodate these variables. [Update: Links to other posts follows:]

Monday, February 28, 2011

Title Protection Rules

From C-Pirate Flickr Stream

No one likes to discard books, no matter how much sense it makes. Psychology and emotion loom larger in deselection decisions than do data and reason. Prompted by my partner Ruth Fischer, we've recently begun experimenting with a different way of thinking about this, one that shifts the focus from rejection to protection. The question "what must we save?" is proving far more productive and positive than the question "what do we have to remove?"

The change in orientation is partly rhetorical, but also has a basis in reality. The change requires that overall responsibility for deselection be assigned at the institutional level. In some cases, that may mean the library as a whole, rather than collection managers; in others, it may mean the College or University which is pressing for additional space or reduced costs, rather than the library.

The institution needs more space for users. Large portions of the print collection are not being used. An obvious solution is to store or discard those many low/no-circulation items. The institution decrees that deselection must be pursued. Titles that have not circulated in many years automatically become candidates for deselection. The institution as a whole bears responsibility for this decision, and affirms the general direction and parameters.

This general candidate list, however, is only the starting point. Some categories of titles may need to be protected, regardless of how little they are used. This is where the subject librarian's work begins, and where a new psychology can be adopted. Instead of active deselection, the process becomes a form of triage: protecting those items that most need protecting. It is a given that not all can be saved; the institution has said so. The subject librarian, then, must determine which categories of material are most important to retain.

In the SCS parlance, these decisions are expressed as "title protection rules." Title protection rules allow for some titles to be exempted from deselection, but also force prioritization. As importantly, it shifts the energy of selectors toward preservation rather than elimination, but in the context of a institutionally-mandated deselection project. Progress can be made, but the most critical exceptions can also be honored.

Certain title protection rules come up with regularity in our discussions with libraries. As described in a previous post, appearance on authoritative lists might inspire retention. Award-winners (Nobel, Pulitzer, National Book Awards, etc.) and "classic" or "seminal" works in a subject are other general categories that may warrant protection. But other, more localized exceptions are also common:

Titles written by faculty members

Books or collections donated by important alumni or benefactors

Books that were part of the library's founding collection

Titles in areas where the collection is known to be weak

Titles important to emerging disciplines on campus

Areas where retrospective collection building has recently occurred.

Titles with high levels of image-intensity (e.g., arts)

Titles from publishers well-regarded in a discipline

Titles in series important to a discipline


From C-Pirate Flickr Stream

These are just examples, of course. Every library (and potentially every discipline represented in the library's collection) is likely to require its own title protection rules. But even from this modest list, the potential variety is clear. Perhaps less obvious is the difficulty of shaping these criteria into effective rules, and avoiding the need for title-by-title decisions. And even if a rule can be defined, the necessary data may not be readily available.

For instance, in order to identify and protect works by faculty authors, a list of faculty authors is needed. Where can this be generated? Should staff as well as faculty authors be included? Do we limit the list to current faculty, or attempt to capture historical contributions as well? To match the author list against the library's catalog, some authority control work may be necessary. Identifying titles donated by well-known alumni or purchased from endowed funds can create similar logistical problems, especially when the only indicators are a physical bookplate or the use of a specific fund.

Some of these are solvable problems, and the degree of effort involved in finding a solution may provide one more measure of the ultimate value of these titles to the library. Articulating title protection rules is an important step in that process.

Tuesday, February 15, 2011

Core Titles and Circular Logic

Here at Sustainable Collections Services, we are working with several libraries to identify low/no-circulation titles in their collections. We then gather additional information about those titles, to help inform deselection decisions. It has been interesting to learn what sorts of supplementary information are most important. Some data needs are obvious, such as the number of holdings in WorldCat, or whether a title already resides in a shared regional storage facility.

But other information is often wanted. For instance, there is a great fear of discarding a title of recognized value, and many librarians wish to know whether a title appears on some form of "authoritative list." Examples of such sources are Resources for College Libraries; CHOICE's Outstanding Academic Titles, and Doody's Core Titles in the Health Sciences.

These lists have been developed to help libraries identify the most important titles --the core titles--for specific types of collections. Criteria for inclusion vary depending on audience level and discipline, but CHOICE's list illustrates one well-defined set:

A title's presence on such lists often--and in many cases appropriately--affects the decision to deselect. To accommodate this,SCS introduced the concept of "title protection" rules, which describe categories of books that are exempt from withdrawal--regardless of their circulation history. There are many other types of title protection rules (e.g., faculty authors, donations from prominent alumni), which I will describe in a subsequent post. But title protection rules based on authoritative lists present an interesting conundrum.

Core titles, are, by definition, books that every library should have. Not surprisingly, they tend to be widely held. They have been deemed valuable by an external, expert reviewing authority. This judgment is subsequently reinforced by collective agreement, expressed through widespread acquisition. These titles are well-regarded and so are widely bought. They remain well-regarded because they are widely held. But this cycle of logic does not address the question of use.

Authoritative lists are excellent collection development tools, assuring that the most important titles in a discipline are represented in the collection. But what sort of deselection tools are they? Core titles are not only the most widely held, but are typically the most easily re-acquired, and the most likely to be available in digital form. But are they also the most widely used? If a "core" title has not circulated in 15 years, how do we weigh that fact against its designation as a core title? So far, most libraries seem inclined to protect these titles from deselection, regardless of use patterns.

This may be exactly the right decision, but there is also an oddly circular logic at work. This title was deemed important, so many libraries bought it. Its designation as a core title inspires librarians to protect it from deselection. Core lists are one element in evaluating collections. We want out collection to measure up. Other libraries are keeping core titles. We will keep core titles.

image from shirtaday.com

Absent consideration of use, this logic will lead us (as a community) to retain hundreds or thousands of copies of the same titles--simply because they appear on the same lists that inspired us to acquire them in the first place. If they are indeed well used, this is a good thing. But if they are little-used, we will miss an opportunity release thousands of feet of shelf space with virtually no risk.

We at SCS are curious whether "core" titles experience higher circulation rates than other titles, and hope to quantify that in some fashion in the near future. They certainly should, given their acknowledged quality. It remains important to know which titles appear on these and other authoritative lists (such as those used for accreditation), but as one of our focus group participants put it back in January: "I'm starting to think that use trumps everything." In our view, it doesn't necessarily trump everything, but it should always be part of the data that drives deselection decisions.

Thursday, February 10, 2011

Misspent Funds or Strategic Reserve?

My colleague Andy Breeding recently forwarded a Cornell University report from its Task Force on Print Collection Usage. The report, released in November 2010, resulted from the Task Force's charge "to conduct a wide-ranging study on the use of the circulating print collections." It is one of the most thorough studies of its kind. The findings, though perhaps not surprising, were nonetheless stark. Among them (with emphasis added):

Approximately 55% of the books published since 1990 and held in Cornell's collections have never circulated.

For books published in 2001, 64.5% had not circulated by the end of 2009.

Of books in circulation on April 19, 2010, only 10.7% were charged out to undergraduates.

Cornell, as a research library with a mission to collect for both current and future scholars in its community, is careful to state that the import of these numbers is far from clear. The report demonstrates how much use varies by discipline (with Math circulating a higher percentage of its holdings than any other discipline), and the Task Force argues forcefully that no "one size fits all" solution exists. They emphasize the need for greater understanding of the data before action. In their words:

High or low circulation rates should not be attributed to a single straightforward cause, particularly in light of wide variation in the role of print monographs in different disciplines.

The Library should not adopt specific across-the-board targets for the circulation rate of print monographs acquired for the collection.

The Library should not halt or diminish acquisitions in particular non-English languages absent a detailed understanding of language distribution among the disciplines and across the broad patron base on campus.

This is good stewardship.But the competing priorities faced by Cornell--and by other research libraries--are evident in their own questions:

"If half of CUL's monograph purchases of the last twenty years have circulated, is that a lot or a little? Precious resources are being spent to purchase, house, and preserve these books, but to what extent should this be regarded as misspent funds and to what extent as investment in a strategic reserve?"

At the heart of this distinction between misspent funds or investment in a strategic reserve lie a number of thorny issues. How should we value use? How do we balance the budgetary pressures of the present against responsibility to the future? And perhaps most importantly, who should bear the cost of a strategic reserve?

The US Strategic Petroleum Reserve

It's instructive to consider the language used around other strategic reserves, such as the Strategic Petroleum Reserve. Phrases such as "guarding against an interruption in supply", "emergency stockpile", "maintain readiness for emergency use", "to cope with unexpected events", are common. At bottom, however, they can be reduced to a single concept: "just in case." This is the very phrase most often used to describe the philosophy of academic library collections, at least until recent years. It is an important role, and it is an expensive role.

A strategic reserve of both print and digital scholarship seems an obvious choice. But like the Strategic Petroleum Reserve, this should be coordinated at the national or regional level, and the costs should be borne by the entire community which depends upon that reserve. As a community, we have begun to move in this direction, through participation in trusted print repositories and trusted digital repositories such as Hathi Trust.. Investment in these programs, through both dollars and contributed collections, will gradually assure that "misspent funds" are converted to something more lasting and cost-effective.