Tuesday, March 29, 2011

The Cost of Deselection (4): Data Comparisons

The time and effort required to acquire and use title-level deselection metadata depend on several factors:
  • What does the library want to know about withdrawal candidates, in addition to the fact that they have not circulated in x years? What deselection metadata is needed?
  • What match points are available in both the library's data and the comparator target?
  • To what degree can comparisons be batched and automated?
To look more closely at this, let's consider a recent project with Grand Valley State University, one of our valued library partners. GVSU's circulating monographs collection includes just over 300,000 titles. At the highest level of analysis, they were interested in creating withdrawal candidate lists based on these criteria:
  • No circulations since 1998
  • Publication year before 2000
  • More than 100 US Holdings
  • More than 10 In-State Holdings (Michigan)
  • Not listed in Resources for College Libraries
  • Never reviewed in CHOICE
GVSU was also interested in titles that might be important to retain or preserve as a contribution to the collective collection. Characteristics of these titles include:
  • Fewer than 10 US holdings
  • Not represented in Hathi Trust
  • No circulations since 1998
  • Publication year before 2000
The cost of developing these criteria has already been considered in the fixed costs portion of our conceptual model. But the actual data comparison incurs additional costs, some of which are variable, growing with the number of titles involved. To generate statistics based on GVSU's criteria, it was first necessary to compare 300,000 titles against several external sources, including:
  • OCLC WorldCat, using the WorldCat API (220 million records)
  • Hathi Trust (4.6 million book titles)
  • CHOICE (156,000 titles)
  • Resources for College Libraries (60,000 titles)
A couple of cost factors come into play here. First, some of the comparator targets require the purchase of a subscription. If these subscriptions are already in place for another purpose, it seems reasonable to omit them from deselection costs. But if subscriptions are added directly in support of deselection, the cost should be included.

The bigger cost, however, is embedded in the work of executing those millions of comparisons and capturing the results. One option, of course, is to to search each title manually, but at this volume that is really impractical, and ultimately would cost an enormous amount in staff time. Remember that all 300,000 items had to be searched in four separate places (now three, since Hathi holdings are represented in WorldCat). It is clearly preferable to match via batch processes, unless the target collection is very small.

A second alternative is to work from a report generated by the library's ILS. Some systems can produce these routinely, but others require local creation of queries or scripts. Even with a good batch of bibliographic data, there are not always mechanisms for matching conveniently against multiple comparator files--these processes must be devised. To interact with the resulting data effectively, it may also be necessary to store results, so that criteria can be modified and new results produced iteratively. At a reasonable scale, much of this can be handled by Excel or Access or their open-source counterparts. But all of this may require a substantial amount of skilled staff time--with the clock running at a minimum of $35/hour.

Even batch-matching costs can vary, depending on what match points are available in both the candidate list and the target data. If the library has OCLC numbers in most records, this speeds matching with WorldCat and Hathi. If the library does not retain OCLC numbers in its local records, LCCN and ISBN matches are possible, but create more work and more exceptions. And some comparator targets, such as CHOICE and RCL, don't include OCLC numbers; different matching routines may be required for a single library.

Yet another approach, with its own associated cost, is to use a third-party tool or service such as WorldCat Collection Analysis, Library Dynamics, the GIST GDM, or the one offered by SCS. In various ways, these services allow the library to outsource some portion of the data comparison work--once the necessary data has been made available to the vendor. This has some advantages, but also some costs. As an example of completed data comparison, this SCS Collection Summary shows the preliminary results from GVSU:


In this instance, the combined effect of their deselection criteria resulted in 53,000 withdrawal candidates and 382 preservation candidates. The effect of individual data comparisons is reflected in the bottom section of the chart. The exact cost of completing this sort of analysis remains to be fully understood, as we are just beginning this work. But it is important to see all costs related to batch processes and tools in context. Any project involving more than 5,000 titles will be very time-consuming to do manually. It it also important to remember that we are still gathering information--and that several more steps remain to get the books off the shelves. Next up: selector review and staging.

Links to related posts:

Tuesday, March 22, 2011

The Cost of Deselection (3): Wage Rates

The further I delve into a cost model for deselection, the more multi-faceted it becomes. There will be several more posts in the coming days/weeks. My plan is to use this blog to think through all facets, one step at a time. Once the salient issues have been parsed, I hope to synthesize all of the pieces into a simple, coherent model. Hope is that thing with feathers, right?

First, let's get organized about hourly staff costs. The basis of the following numbers: that old reference chestnut The Occupational Outlook Handbook, produced by the Bureau of Labor Statistics. The following numbers are drawn from the 2010-2011 edition, but they are based on 2008 data. Given the economic situation since then, it seems unlikely the rates would have changed much. In the categories of Librarians, Library Technicians, and Library Assistants, I've used the segment entitled "Colleges, universities, and professional schools" -- this is an academic library cost model. Similar numbers are available for public and school libraries, but for now they are mercifully beyond our scope here.

As noted in yesterday's post, I've converted the annual Librarian salaries to an hourly rate, assuming a 40-hour work week. In all three categories, I've added 30% to the hourly rate to account for benefits, which BLS estimates at 29.2% of salary on average. For student workers, I've assumed that most libraries pay minimum wage ($7.25/hour) and that no benefits are involved; I've rounded up to $8 to account for higher wages paid to longer-serving students.

POSITION               HOURLY WAGE        HOURLY COST(w/ benefits)

Librarian                    $ 26.52                         $ 34.48

Library Technician     $ 15.91                         $ 20.68

Library Assistant        $ 12.92                         $ 16.80

Student Workers        $  8.00                          $  8.00

 These are median rates; i.e., the middle value in the distribution of salaries in these categories. This seems a reasonable level at which to build a conceptual model. Clearly wages will vary by region, and to some degree by specialty and longevity. Any application of the model could be modified to account for known variances. The main purpose here, however, is to identify the components of the process, estimate the time invested by each level of staff, and to establish a method for estimating overall costs and costs per volume.

Comments are welcome. Do these seem like reasonable rates on which to construct the cost model?

Links to related posts:

Sunday, March 20, 2011

The Cost of Deselection (2): Fixed Costs

Any comprehensive costing model for deselection must consider the process from inception to completion. Some components will be fixed costs (largely independent of the volume of titles under review), and others will be variable, increasing or decreasing in proportion to the number of volumes actually handled. Fixed costs may be one-time (if deselection is conceived as a project), or recurring (if deselection is conceived as an ongoing activity).




For the sake of clarity, let's start with the example of a one-time project to remove 20,000 volumes. Today's post will consider some examples of fixed costs:
  • Project design and management
  • Data assessment and extract
  • Development of criteria for candidate lists
  • Communication with stakeholders
Project design and management:  Every deselection project starts somewhere. Sometimes Stacks Management can no longer shelve new titles in a particular range. Sometimes a selector recognizes how old and little-used a segment of titles are. Sometimes an administrator wants to repurpose space. The genesis and scale of the project will determine its objectives, and will suggest who needs to be involved.

For a project involving 20,000 volumes--a figure equivalent to that annual intake of many libraries--a formal plan will be needed. Meetings will be required among affected selectors, stacks management staff, technical services staff, supervisors of student or temporary workers, etc.. A communication strategy will need to be developed. Liaison with Facilities may be necessary for removal of deselected volumes. Decisions will need to be made regarding criteria for deselection, record maintenance, timing, and disposition options.

If a group of six people met for two hours once a week for a month to plan the project (a very conservative estimate, in my experience), 48 hours of librarian and staff time would be absorbed in planning. It seems reasonable that the group would continue to meet regularly throughout the course of the project, so we might add another 48 hours for one meeting a month over the next six months. 
Estimated time for project design & management: 100 hours

Data Assessment and Extract: Every library system stores circulation data differently. Individual libraries may count and group circulation statistics differently. Some libraries count in-house use (some as charges in the system, some entirely separately). The extent of circulation data retained by individual libraries typically dates back to the most recent ILS migration. Obviously, analysis can only be done within the constraints of available data.

The accuracy of available data depends on how regularly inventories and shelf-reading have been done. In some cases, holdings data has not been kept current in WorldCat. Each library must decide its tolerance for dealing with less-than-ideal data. In some cases, remediation work may be necessary, but more often that remediation will occur as a welcome side-effect of the deselection process. Item-level data and acquisitions data may be important. For example, "date acquired" is an essential modifier of circulation data--we don't want to deselect titles that have only recently been added. Item records typically contain location, barcode numbers and other elements such as donor information. It is useful to understand what match points may be available for comparison to external data sources (e.g., OCLC number). The perspectives of a Systems Librarian and Technical Services are both important here, to ascertain what data resides where.

Once the project team understands the characteristics of the bib, item, and acq data available for analysis, it needs to be pulled from the ILS in the form of reports or extracts. Again, every ILS provides different approaches to this, ranging from template-based collection management reports to customized SQL queries directly against the library's database tables. Some approaches require more time and expertise than others, particularly if subject or location specificity is wanted, and on how fully the bib and item data are integrated. The extracted file of no/low-use items then needs to be formatted for review and for comparison to external data sources. 
Estimated time for data assessment and extract: 20 hours (assumes no problems)

Development of criteria for candidate lists: In a benign dictatorship, this step would require no time at all. But that's not how most libraries operate. First, what date range applies to circulation; e.g., are we looking at titles that have not circulated in the past ten years? the past five years? Are we considering titles that circulated once during that period? What publication dates will we consider? Do we need to factor in the date added to the collection as well as imprint date? Are some subject areas off-limits?

If a non-circulating title appears in Resources for College Libraries or is a CHOICE Outstanding Academic Title, does that change our opinion? What number of WorldCat holdings should we use for a first pass--100 in the US? 50 in the US? 10 in our state? Which consortial partners or peer libraries ought to be considered? Are we interested in whether a title appears in Hathi Trust?

For those few libraries that have active deselection programs, many of these questions may already have been worked out. But most libraries will need to spend some time thinking through the collection and access issues here before taking action on a substantial number of titles. Meetings, draft policies, more meetings, and revisions will be necessary. 
Estimated time for development of deselection criteria:  100 hours

Communication with Stakeholders: Few library initiatives attract more attention than removing books from the shelves. To avoid misunderstandings and negative consequences, it is critical that the library deliberately shape its message about drawing down the print collection, and make certain that message is widely communicated and understood. This takes time and patience, and especially during early projects will add to the costs associated with deselection. Articles and blogs need to be written laying out the rationale, and highlighting safeguards. Presentations must be developed and delivered to the faculty at large, to individual departments, and even to students. It may be necessary to make the case to colleagues within the library as well. While the deselection message can be incorporated into liaison programs and other routine channels of communication, in most libraries a more focused effort will be needed. 
Estimated time for communication with stakeholders:  100 hours

The total for these four areas is 320 hours of librarian and administrator time. This is deliberately an extremely conservative estimate -- and comments are most welcome on this point -- but its primary purpose is illustrative. According to the Bureau of Labor Statistics Occupational Outlook Handbook 2010-2011, the median 2008 salary for librarians in colleges, universities, and professional schools is $55,180. Assuming a 40-hour weeks, this equates to $26.52 per hour. If we add the BLS figure of 30% for benefits, the median hourly rate for a librarian is $34.48. The activities outlined above impose an estimated cost of $11,034. (320 hours x $34.48=$11,034). Even this modest total equates to $.55/volume for the 20,000 volumes in our example.

At this point, the library does have a project plan, a handle on the data, a communication strategy, and an early message out to the community. But this is only the beginning. The books are still on the shelves. Comparisons with external data sources have not begun. No deselection decisions have been made. No bib or item records have been touched. Other costs will follow, and I'll continue tracing those in the next post.

Links to related posts:

Friday, March 4, 2011

The Cost of Deselection (1)

Recently there has been a lot of interest in the question of how much it costs to remove a book from the library. Judging from the listserv traffic, no one yet has a complete answer. A couple of relevant comments from librarians who have captured a portion of the cost:

From Steve Bosch at the University of Arizona:
We have measured the amount of time required to withdraw materials. Since we have removed over 350,000 vols the number seems to hold up for our planning purposes. This is an average for all types of withdraws from serials with multiple items to books. Over the years the avg has been about 6 min per title. This includes pulling the item from the stacks – updating local records, updating OCLC holdings, and the physical processing of the de-accessioned item.  
From Martha Hruska at UC/San Diego:
When we can do large batch processing of withdrawals, our Facilities manager estimates that the pulling, stamping, boxing, staging, and transporting to Surplus Sales costs approx. 25 cents per volume.  When we need to pull up records one by one using student help, the price doubles to 50 cents per volume (on average).
These are both useful indicators, but of course represent only a portion of the overall cost of deselection. They also measure things differently:
  •  U of A: 6 minutes per title (or 10 titles per hour)- does not appear to include transport.
  • UCSD:  25-50 cents per volume (including transport).
These estimates are actually quite far apart. If we assume $10/hour for labor, U of A's cost per volume would be $1.00. If we assume the same hourly rate for UCSD (with students pulling up records one by one), they could process 20 titles/hour--more than double U of A's rate--for the same cost. Under the UCSD batch process, they process 40 titles/hour. And the UCSD process appears to include more steps. There is clearly much more to learn here.

These back-of-the-envelope calculations point to a clear need for a comprehensive cost model for deselection, which would need to include the intellectual work (data gathering and deselection decision-making), as well as normalizing the physical handling and record maintenance tasks captured in the examples above:
  • Identification of no/low-circulation titles
  • Determining/negotiating parameters for deselection  (imprint date ranges, title protection rules)
  • Identification of holdings by consortial partners or WorldCat
  • Staging titles for physical review, condition comparison
  • Selector time/Faculty time spent in review of lists and physical items
  • Error correction--books that don't match records, etc.
It is only after these steps are performed that the cost models described above come into play, and even here there is more than one variable in each step:
  • Physical Handling
    • Pulling from shelves or staging area
    • Library stacks vs various storage facilities
  • Record Maintenance
    •  Local bib and item record updates
    •  OCLC holdings updates
    • Insertion of  URL to digital version
  • Disposition Options
    • Packing and staging
    • Shipping 
    • Recycling
    • Selling/Donating
Not every library will incorporate all of these steps, and of course that is the point. Library practice varies widely here, and so do costs.Some libraries have good batch update capabilities; others less so. Disposition options range from boxing for sale or transport, to dropping items into a recycling bin. Handling costs depend on which option the library chooses. Some libraries support staging and physical review; others work from lists. Pulling from stacks or open storage shelves is much less labor-intensive than pulling from bins in a high-density storage facility.

In short, deselection policies, workflow designs, and systems capabilities dramatically influence transaction costs for deselection. In subsequent posts, I will outline a conceptual model for estimating deselection costs that can accommodate these variables. [Update: Links to other posts follows:]