- What does the library want to know about withdrawal candidates, in addition to the fact that they have not circulated in x years? What deselection metadata is needed?
- What match points are available in both the library's data and the comparator target?
- To what degree can comparisons be batched and automated?
- No circulations since 1998
- Publication year before 2000
- More than 100 US Holdings
- More than 10 In-State Holdings (Michigan)
- Not listed in Resources for College Libraries
- Never reviewed in CHOICE
- Fewer than 10 US holdings
- Not represented in Hathi Trust
- No circulations since 1998
- Publication year before 2000
- OCLC WorldCat, using the WorldCat API (220 million records)
- Hathi Trust (4.6 million book titles)
- CHOICE (156,000 titles)
- Resources for College Libraries (60,000 titles)
The bigger cost, however, is embedded in the work of executing those millions of comparisons and capturing the results. One option, of course, is to to search each title manually, but at this volume that is really impractical, and ultimately would cost an enormous amount in staff time. Remember that all 300,000 items had to be searched in four separate places (now three, since Hathi holdings are represented in WorldCat). It is clearly preferable to match via batch processes, unless the target collection is very small.
A second alternative is to work from a report generated by the library's ILS. Some systems can produce these routinely, but others require local creation of queries or scripts. Even with a good batch of bibliographic data, there are not always mechanisms for matching conveniently against multiple comparator files--these processes must be devised. To interact with the resulting data effectively, it may also be necessary to store results, so that criteria can be modified and new results produced iteratively. At a reasonable scale, much of this can be handled by Excel or Access or their open-source counterparts. But all of this may require a substantial amount of skilled staff time--with the clock running at a minimum of $35/hour.
Even batch-matching costs can vary, depending on what match points are available in both the candidate list and the target data. If the library has OCLC numbers in most records, this speeds matching with WorldCat and Hathi. If the library does not retain OCLC numbers in its local records, LCCN and ISBN matches are possible, but create more work and more exceptions. And some comparator targets, such as CHOICE and RCL, don't include OCLC numbers; different matching routines may be required for a single library.
Yet another approach, with its own associated cost, is to use a third-party tool or service such as WorldCat Collection Analysis, Library Dynamics, the GIST GDM, or the one offered by SCS. In various ways, these services allow the library to outsource some portion of the data comparison work--once the necessary data has been made available to the vendor. This has some advantages, but also some costs. As an example of completed data comparison, this SCS Collection Summary shows the preliminary results from GVSU:
In this instance, the combined effect of their deselection criteria resulted in 53,000 withdrawal candidates and 382 preservation candidates. The effect of individual data comparisons is reflected in the bottom section of the chart. The exact cost of completing this sort of analysis remains to be fully understood, as we are just beginning this work. But it is important to see all costs related to batch processes and tools in context. Any project involving more than 5,000 titles will be very time-consuming to do manually. It it also important to remember that we are still gathering information--and that several more steps remain to get the books off the shelves. Next up: selector review and staging.
Links to related posts:
- Cost of Deselection (1)
- Cost of Deselection (2): Fixed Costs
- Cost of Deselection (3): Wage Rates
- Cost of Deselection (5): Title Review from Lists
- Cost of Deselection (6): In-Stack Review
- Cost of Deselection (7): Staged Review
- Cost of Deselection (8): Disposition Options
- Cost of Deselection (9): Data Comparisons Revisited
- Cost of Deselection (10): Summing Up