Data, of course, is not monolithic. Despite the prevalence of library standards and agreed practices, there can be substantial differences among data from seemingly similar collections. For monographs deselection, the working data set includes not only bibliographic records, but item/holdings information (e.g. location, barcode number, enumeration), and circulation data. Even when bibliographic data is relatively consistent, item records and circulation data often vary a great deal. A few observations from our early experience:
There are no perfect catalogs. Not exactly a news flash, but all catalogs include a healthy number of mistakes. Some matter more than others, and some matter more to users than to the analytics SCS is performing. Our work allows us to ignore most problems related to descriptive cataloging, but SCS does rely heavily (though not exclusively) on control numbers. The OCLC number, LCCN, and ISBN comprise the holy trinity for matching a library's holdings against WorldCat, HathiTrust and other target data sets. Control numbers seem straightforward, and in fact are--assuming that:
1) they are actually present;
2) they are formed correctly;
3) prefixes are entered consistently; and
4) mysterious errors such as the insertion of a '7' in front of some OCLC numbers have not occurred.
Suffice it to say that data normalization on these fields is an essential first step.
Holdings in WorldCat and regional union catalogs are not always current. Spurred by journal de-accessioning projects, many libraries in recent years have embarked on OCLC "reclamation" projects, to assure that all holdings in the library's catalog are represented in WorldCat. This sort of recalibration is a good thing; the more libraries that pursue it, the more reliable the holdings information in WorldCat. Reclamation also benefits monographs, and improves the accuracy of the WorldCat data on the number of copies held in the collective collection.. The integrity of the shared print collection depends on verified holdings. The final step in any withdrawal project should be the removal of holdings from OCLC (or, as shared print archiving grows, the replacement of the library's holding symbol with that of the regional storage facility upon which it relies).
Circulation data varies widely and wildly. This point has come home to us with vigor recently, as we begin work with a small group of libraries seeking to share responsibility for retention of low-use print monographs. The first task is to identify those low-circulation titles, which requires combining and normalizing circulation data. This is more difficult than it sounds. Three different library systems are in use among the group, which means that circulation data is captured in different ways. Some libraries have total circulations back to 1988; others only for a few years. Some libraries retain the date of last circulation (at least for some segment of the data); others do not. Some libraries include in-house use, ILL, and reserve transactions in their circulation counts; others do not. Some libraries use their circulation module to 'check out' books to Acquisitions or Cataloging or Bindery while they are in process; others do not. What common usage data exists across all participating libraries? What level of analysis will the data support? Stay tuned on that one; there's a good deal of work to do first.
These are just examples, of course. But they begin to illustrate the need for caution and precision in handling the data on which deselection decisions will be based. At present, when so many libraries have so much overlapping and unused content, it is possible to set aside any items with questionable data and still have plenty of scope to act. There are enough items with good data to achieve first-round deselection targets. For now, we can make significant progress by acting on only what the best data supports. Longer-term, this will get more complicated. As a community, we'll need to improve the data, or agree to run bigger risks.
No comments:
Post a Comment