Monday, August 29, 2011

Fixin' to Weed

As a lifelong Northerner, there are certain words I don't get to use. Y'all, for instance, will never sound right coming out of my mouth. Southern friends have warned me that I could be fined or jailed for saying it. (Well, OK, not in so many words, but you can tell.) Another example is fixin', as in I'm fixin' to have me some catfish. Fixin' is a fine word, full of resolve and focus. I figure it's safe to use as long as I don't actually speak it aloud. It's about getting ready, getting mentally prepared, even looking forward to something. As in 'I'm fixin' to weed the business section.'

Well, September's right around the corner, and with the start of the new academic year, we expect some librarians are fixin' to do some weeding this year--at least once all of those instruction sessions are over. We at SCS would like to suggest that fixin' to weed actually is an important first step, and that it's pretty straightforward. In less than a single day, it's possible to gauge the potential of a deselection project, by taking these 5 steps:

Look at your circulation data:  Ask for a report that shows how many books have not circulated in the past 10 years. Screen out reference and special collections titles from consideration. Other bits of data can help, e.g., date of last circulation, date acquired, but are not essential at this stage. Most libraries find that 40%-50% of their collection has not circulated in a decade. This report defines your library's sweet spot, the potential yield of a deselection project.

Look at your space: How crowded are the stacks? How busy are the stacks? How crowded are study spaces? What might you do with an additional 10,000 square feet? A writing center? An expanded information commons? Some big flatscreen monitors or whiteboards for collaboration? A coffee shop?

Look at alternatives for maintaining low-use content: How many copies of the same books are also in the collections of peers or borrowing partners? Can the library join a shared print retention initiative? How many holdings are shown in WorldCat? In what other forms might the same content be available; i.e., how readily replaceable or re-accessible is it?  

    Look at an SCS Sample Report: See how aggregated deselection metadata can expedite decision-making. Collection summary reports help identify the most fruitful areas, and enable experimentation with deselection criteria. Withdrawal candidate lists highlight titles that meet those criteria. A library an also assemble this circulation and deselection metadata on its own.

       Look at the cost of doing nothing: While the biggest costs associated with inaction are opportunity costs (what else the library could do with the space occupied by unused material), there are also direct costs. These have been calculated by Courant and Nielsen at $4.26 per volume per year for titles in open stacks, and $.86/volume/year in high-density storage. Maintaining the status quo may be a desirable option, but it is not free.
       If the results in your library at all resemble what we've seen to date, you'll quickly move from fixin' to weed to chafin' to weed. Straightforward consideration of these five points won't take long, and will give you clear insight into the potential benefits of deselection in your own library. In effect, the data will help you make the case for deselection to yourself and your colleagues, and will echo David Maister's comment in his excellent Strategy and the Fat Smoker: "The necessary outcome of strategic planning is not insight but resolve."

      Tuesday, August 16, 2011

      As Good As The Data

      My partners and I at Sustainable Collection Services (SCS) coined the term 'data-driven deselection' as shorthand for our service offering and web application. We believe that solid data can help rationalize the necessary drawdown of print monograph collections, a process that sometimes elicits strong emotions. Good data  lays out the facts and sets context. Consideration of circulation rates, the number of other copies in the state, region, or nation, and the existence and accessibility of secure digital versions makes informed retention decisions possible. Data-driven deselection assures that withdrawals take place only when a title is well-secured in the collective collection. The very same data assures that the collective collection does not remain overloaded with copies of low-use books. It enables intelligent action.

      Data, of course, is not monolithic. Despite the prevalence of library standards and agreed practices, there can be substantial differences among data from seemingly similar collections. For monographs deselection, the working data set includes not only bibliographic records, but item/holdings information (e.g. location, barcode number, enumeration), and circulation data. Even when bibliographic data is relatively consistent,  item records and circulation data often vary a great deal. A few observations from our early experience: 

      There are no perfect catalogs. Not exactly a news flash, but all catalogs include a healthy number of mistakes. Some matter more than others, and some matter more to users than to the analytics SCS is performing. Our work allows us to ignore most problems related to descriptive cataloging, but SCS does rely heavily (though not exclusively) on control numbers. The OCLC number, LCCN, and ISBN comprise the holy trinity for matching a library's holdings against WorldCat, HathiTrust and other target data sets. Control numbers seem straightforward, and in fact are--assuming that:  

      1) they are actually present;
      2) they are formed correctly;
      3) prefixes are entered consistently; and
      4) mysterious errors such as the insertion of a '7' in front of some OCLC numbers have not occurred.

      Suffice it to say that data normalization on these fields is an essential first step.

      There are even fewer perfect inventories. Even the loveliest bibliographic record cannot directly answer the question 'is this item really on the shelf?' Shelf-reading and regular inventories are the sorts of tasks that libraries often defer in the press of other business. This is a logical trade-off in an era when print use is declining. But like all deferred maintenance, it eventually bites back.As shared print collections become more important, reliable inventory data is essential. That reliability is not a given at present; just ask any ILL librarian. Therefore, the date--and results-- of the library's most recent inventory should be articulated in any deselection project.

      Errors and anomalies also scale. Efficiency, batch processing, and scale are essential to library operations. It is important to remember that the flip side of these approaches is the possibility of systemic error. In a large data set, even a miniscule rate of errors or gaps can result in a sizable raw number of exceptions. A set of 1 million records that is 95% accurate includes 50,000 errors or questions. Such a data set would be rated AAA--and probably doesn't exist.

      Holdings in WorldCat and regional union catalogs are not always current. Spurred by journal de-accessioning projects, many libraries in recent years have embarked on OCLC "reclamation" projects, to assure that all holdings in the library's catalog are represented in WorldCat. This sort of recalibration is a good thing; the more libraries that pursue it, the more reliable the holdings information in WorldCat. Reclamation also benefits monographs, and improves the accuracy of the WorldCat data on the number of copies held in the collective collection.. The integrity of the shared print collection depends on verified holdings. The final step in any withdrawal project should be the removal of holdings from OCLC (or, as shared print archiving grows, the replacement of the library's holding symbol with that of the regional storage facility upon which it relies).

      Circulation data varies widely and wildly. This point has come home to us with vigor recently, as we begin work with a small group of libraries seeking to share responsibility for retention of low-use print monographs. The first task is to identify those low-circulation titles, which requires combining and normalizing circulation data. This is more difficult than it sounds.  Three different library systems are in use among the group, which means that circulation data is captured in different ways. Some libraries have total circulations back to 1988; others only for a few years. Some libraries retain the date of last circulation (at least for some segment of the data); others do not. Some libraries include in-house use, ILL, and reserve transactions in their circulation counts; others do not. Some libraries use their circulation module to 'check out' books to Acquisitions or Cataloging or Bindery while they are in process; others do not. What common usage data exists across all participating libraries? What level of analysis will the data support? Stay tuned on that one; there's a good deal of work to do first.

      These are just examples, of course. But they begin to illustrate the need for caution and precision in handling the data on which deselection decisions will be based. At present, when so many libraries have so much overlapping and unused content, it is possible to set aside any items with questionable data and still have plenty of scope to act. There are enough items with good data to achieve first-round deselection targets. For now, we can make significant progress by acting on only what the best data supports. Longer-term, this will get more complicated. As a community, we'll need to improve the data, or agree to run bigger risks.

      Monday, August 8, 2011

      Discarding Useless Materials

      NYPL on opening day, May 23, 1911
      In April 1911, just one month before the New York Public Library opened its grand new main library on the site of the old Croton reservoir, New York State's Inspector of Public Libraries Asa Wynkoop contributed two short articles to New York Libraries: "Gifts of Books" and "Discarding Useless Material." The full citation appears in a previous post, where I also describe my pursuit of these obscure writings. I was curious to know how weeding and deselection were described a century ago--when there was no electronic content, when books were much scarcer, and when the great print collections in the US were just beginning to be built.

      A few added lines to the sketch. A hundred years ago, in 1911:
      • 11,123 books were published, according to the American Library Annual. That's less than 10% of the current rate of publication.
      • UC/Berkeley's new Doe Library opened with 160,000 volumes. (Berkeley's entire collection at the time consisted of 210,000 volumes.) Doe was built with decades of growth in mind, to hold 800,000 volumes. As of 2009, Berkeley reports holdings of 11 million book volumes.
      • Human memory was a primary backup system. New York Libraries reported that "a fire destroyed the copy--nearly ready for the printer--of the Tentative selection from the best books of 1910, together with all the notes on which it was based and the books themselves." The list "will be reconstructed in large measure from the memories of those who have been actively engaged in the preliminary work..."

      Berkeley's Doe Library under construction in 1909
      As ever, though, space and other resources were in short supply. Most libraries had to confront these limitations, and some scrutiny of collections was in order. Mr. Wynkoop's hard-headed, practical advice, though aimed at smaller libraries, resonates surprisingly well today even at the research level. In effect, the abundance of print collected over the past hundred years has rendered even the largest libraries "small.".

      The first article, on "Gifts of Books", sought to prevent the acquisition of unwanted material in the first place. Mr. Wynkoop recommends: "Never place on the shelves a book which [the library] would not select and buy if it had the money." He goes on to say "it should further be borne in mind that it costs a library far more in the course of years to keep and care for a book than to buy it. Every book on the shelves is a positive and continuous expense, and it is a simple waste of a library's resources to incur this expense unless the book is likely to yield an actual return."  As concise a case for caution in regard to "free" books as I've come across. Note also the century-old focus on ROI.

      The second piece, "Discarding Useless Materials" addresses the management of books that have reached the shelves. In contemporary terms, it encompasses cost avoidance, lifecycle costs, discoverability, and collection sustainability. Overall, Inspector Wynkoop has some strong words for the profession (emphasis added):
      "Librarians show a good deal of timidity and lack of a definite policy [in] the discarding from their shelves of obsolete and useless material."
      "When a book once gets on the shelves, it seems to acquire in the eyes of most librarians a peculiar virtue and reverence, irrespective of any service it may render."
      "In how many libraries where costly additions of new rooms or buildings have been necessitated to accommodate the growing collection, could this expense have been spared and the money utilized for positive enrichment of the collection, had the shelves been freed from the dead material with which they are encumbered!"
      "Good live books are often lost or buried among dead ones. It has been shown by experiment again and again that a collection of best books, when grouped by themselves, receive twice as much use as when scattered among old and obsolete material."
      In short, he speaks about weeding and deselection directly and vigorously. Mr. Wynkoop also has little patience for agonizing over deselection decisions, and here his argument takes an interesting turn.
      "There is no intrinsic reason why [elimination of unused books] should be such a difficult or delicate task. It is certainly easier to know the value of a book which has been on the shelves for years than that of a book which has not yet been bought. Every time books are selected for purchase, other books are rejected, and rejection after purchase and after a test of years is certainly an easier matter than rejection before purchase where not tests of actual value have been possible."
      Here is a perspective I had never considered, namely that deselection occurs throughout a book's lifecycle. No library buys every title published. Every library deselects extensively from the endless stream of new publications. Deselection at that point is a form of speculation. Deselection that occurs after ten years on the shelf without a single use is a clearer and more defensible decision.

      The Inspector keeps a stern eye on costs throughout his argument, especially "the fallacious idea that the main expense of a book is its original cost." In recognition of the "reverence" of  library boards for "mere size and numbers", he suggests "establishment of a storage department, away from the public shelves, to which all obsolete and useless matter can be transferred." It's a bracing and eerily relevant read from beginning to end.

      In his closing, he contends that a good library "is not a mere accumulation of books but a selection, and this selection should represent not a mere succession of past acts but a continuous and active process." In my own closing, let me just doff my hat to the good Inspector and humbly quote Hunter S. Thompson, among others: "Res ipsa loquitor." Good sense stands the test of time.
      Photo Credits:

      NYPL: - Library of Congress, Prints and Photographs Division, Bain Collection - Reproduction number: LC-DIG-ggbain-09235

      Doe Library: UC/Berkeley Library History Room,

      Monday, August 1, 2011

      Old School

      The Apex of Civilization (1987)
      It's shockingly easy to forget how far library services have advanced in the past 25 years. Even those of us who thought a self-correcting electric typewriter was the apex of civilization now expect -- and take for granted -- remote desktop access to most content. Last week I was reminded of just how much overhead can still be involved in information seeking. It made me want to assail the nearest campus with a bullhorn, shouting "You don't know how good you have it!" I always loved it when my parents said stuff like that to me.

      Several months ago, I came across an interesting citation in ALA Fact Sheet 15: Weeding Library Collections, alerted to it by Karen Muller's Ask the ALA Librarian blog:  

      Wynkoop, Asa. "Discarding Useless Material." Wisconsin Library Bulletin. 7, no. 1 (1911): 53.

      I appreciated the literal deployment of the word "useless" to describe materials that had not circulated. I grew curious about how library weeding was discussed a hundred years ago, and thought a centennial snapshot might prove interesting. Since I am a modern man, I began my search in Google. An excellent summary from the Wisconsin Library Heritage Center appeared as the first result. Good start. It noted that that most of the Bulletin had been digitized as part of Google Books. Very promising. 1905, 1907, 1908, 1910 are available. 1917... yes!  As a UK football fan might say, "unlucky!"

      Plan B. As a network-level denizen, I next opted to search WorldCat. I immediately found the record for Wisconsin Library Bulletin, and keyed in my zip code. As I had feared, the nearest copy lay 71 miles away, in the Beatley Library at Simmons College, past destination of my library-school papers, as bashed out on the aforementioned IBM Selectric. Like many a librarian who works outside of an academic institution, however, I have limited access to research databases, and virtually none to ILL or document delivery for such specialized titles. And at that pesky sub-network level, where objects actually have to be transported from one place to another, well, I live in New Hampshire and the bound volume is in Boston. (See Karen Coyle's recent blog post "Unequal Access.") Bottom line...road trip! 

      I-93 South
      On the up side, I know the trail well. I cleared my calendar for the next day, filled up the Matrix, and headed south. It was a Thursday, late enough to miss full rush hour, but...still a whisker shy of the Platonic form of a good day.. Luckily I had Old 97's newest, Grand Theater, Volume 2, to keep me happy. That is one cheerful-sounding band. Rather than shatter all that good humor, let's just say that the drive allowed me to hear all 13 songs twice. Parking around Simmons can be tough, too, but I lucked out with a garage that only charged $13 for the first hour. Positively vibrating with...something, I

      Target Database
      walked a few blocks to the Library, and began to  troll the compact shelving. Found the WLB run from 1911, after a heart-stopping moment when a stack of microfilm boxes entered my field of vision. (Please, not that.) Suddenly, there it stood, exactly where it was supposed to be. Success! The old-school high. It really is a thrill to find something after so much trouble.

      In skimming the WLB version of the article, I noticed it had in fact been reprinted from the April 1911 issue of New York Libraries. As now, the topic of weeding was sweeping the land. Might as well have the original, I reckoned--and Simmons' ancient bound periodical collection came up aces again. The text of the two versions proved identical, but the NYL version did yield one new nugget: Asa Wynkoop, the article's author, held the august position of New York's "Inspector of Public Libraries." What an excellent title. What a fine idea. Where are they now--those Inspectors of Public Libraries? We'll actually get to that question and to "Discarding Useless Material" in next week's post. But first, let's finish the old-school retrieval process.

      No copy and paste here, no instant printing. Shades of 25 years ago at Simmons, I spent some quality time with the photocopier. Instead of searching for coins, one now obtains a visitor card, goes online and adds value to that card. So there's that improvement, which takes only a few minutes longer than finding change for a 5-dollar bill. One copier appears to be working but is occupied. One is out of paper. One bears an abstruse message about "key reset" and does not respond to any command. When the single functioning unit is free, it turns out to work no better than I recall from my student days, as these scans will attest. I have new respect for the scanners at Google, the Internet Archive and everywhere. But at last, I had the information in my hands. Now all I had to do was get home with my treasure.

      Not so very long ago, some version of this experience was the norm.Clearly it can still occur. If only Mr. Wynkoop had had the courtesy to publish his piece in 1910 (already scanned and available through Google Books and Hathi Trust), I could have learned in 5 minutes what ate up most of a day. On some level, that's what we all expect; it's startling to be reminded how completely things have changed. The productivity improvements unleashed by digital access to content can hardly be overstated. But it can also be easy to forget that mass digitization is still a work in progress, that gaps remain in the electronic version of our professional record. And equally startling that some of those gaps can still be filled by print. With some effort. 

      Old-school information retrieval was (and is) time-intensive, labor-intensive, expense-intensive. Often it's impossible to know beforehand whether the item you're seeking is worth the investment. A citation is just a starting point. You have to be pretty damn motivated (obsessed? foolish?) to drive 150 miles, spend $40 on gas, tolls, and parking, and burn a full day to retrieve an article that turns out to consist of three long paragraphs. Most contemporary students would not consider this rational behavior. I'm not even sure I do.

      But I am glad to have the article. I'm glad to know that old-school still works when needed. I'm also glad that for the most part I can search, click, and view without this insane degree of overhead. I look forward to the day when the entire run of the Wisconsin Library Bulletin is available digitally. But in the meantime, I'm also glad that Simmons had these 100-year old print volumes, and that their use will now be counted. As of last Thursday, they are decidedly not useless materials.

      Use Study in Progress