Tuesday, February 21, 2012

Bibliometrics and Book Retention

As I've stated in other contexts, selection and deselection represent the same intellectual activity, performed at different points in a book's lifecycle. Deselection has one significant advantage, though. It can be based on a track record of circulation, in-house use, and appearance on authoritative lists. We began to explore yet another type of historical evidence in a previous post on The Impact of Books: citation counts. Although it seems reasonable to presume that the number of citations to a book would correlate with discovery and use, we need a deeper understanding of the underlying dynamics. Highly-cited books seem likely to be important books, books worth keeping, books more likely to be wanted in future.

Bibliometrics "uses quantitative analysis or statistics to describe patterns of publication within a given field or body of literature." Not surprisingly, bibliometric techniques originated in the hard sciences and in the journal literature, but they are now used in many disciplines and increasingly on monographs. Historically, citation analysis has been used to evaluate researchers and departments, and to gauge the impact of a contribution to its discipline. Our purposes are related but somewhat narrower. We are seeking to identify high-impact books within a discipline to assure that they are retained. Can bibliometrics help identify these titles? What can citation patterns tell us about how intellectual content ages in specific disciplines?

Conceptually, this turns out to be a rich vein. and the literature and data run deep. Consider some of these potential data points:
  • Total number of citations: a straightforward measure of citation frequency. However, it may be useful to distinguish between journal-to-book citations and book-to-book citations. The former can be easily (though partially) retrieved through journal indexes. The latter are beginning to be identified using Google Books and Hathi Trust, but at present are largely unavailable.
  • Average citation frequency: Number of citations per monograph in a discipline. Used to compare activity among disciplines.
  • Citation peak: date after publication at which the maximum number of citations occur.
  • Noncitation ratio or Uncitedness Index: absence of citations in a defined time period.
  • Price's Index (citation recency): "calculates the proportion of the number of citations no more than five years old over the total number of citations an item receives."
  • Half-life of citations: a measure of "obsolescence" of scholarly literature, obtained by "subtracting the publication year of source documents from the median publication year of citing documents."
  • Reference decay: the point after which 90 % of citations to a work occur.
There are obvious implications here for monographs deselection and retention. These measures provide one kind of insight into the impact and staying power of individual works. They also enable identification of content aging patterns at the disciplinary level, especially when examined by the periods of "knowledge diffusion" or "intellectual acceptance" developed by Lindholm-Romantschuk and Warner (in "The Role of Monographs in Scholarly Communication: An Empirical Study of Philosophy, Sociology and Economics"). These periods are:
  • Initial Reception: "the period of three calendar years from publication (including the year of publication).
  • Intellectual Survival: the number of years after initial reception that a book continues to be cited.
In an eye-opening 2008 article entitled "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs" Professor Rong Tang of Simmons College employs a number of these concepts to "explore disciplinary difference in the citing of books." Her work centers on 750 randomly selected monographs, 125 each in Religion, History, Psychology, Economics, Math, and Physics. The study seeks to answer two research questions:
"Are there significant domain or disciplinary differences in the distribution of citations to monographs, half-lives, and Price's Index?"
"If conditioned on the periods of intellectual acceptance, are there significant differences among disciplines in terms of citation frequency and number of books cited per period?"
The article presents its methods, concepts, and results clearly. It is well worth reading in its entirety. The table reproduced below begins to show the potential variability across disciplines:

Rong Tang, "Citation Characteristics and Intellectual Acceptance of Scholarly Monographs"

Some of its more surprising results include:
  • Psychology received the highest number of citations, with more than 6,000 and an average of 48.1 citations per monograph, followed by math and physics. History received an average of 3.2 citations per item.
  • Physics has the longest half-life, while humanities disciplines have the shortest.
  • The highest uncitedness ratios occurred in history (52%) and Religion (59%).
  • "...the peak time of citations for six disciplines all occurred within the first 20 years of publication."
  • "Religion and history reached their highest citation amount within the first five years...whereas psychology, physics and mathematics did not receive their citation heyday until more than six years after publication."
  • Citations of most disciplines increase at six years after publication. "The highest potential period of intellectual acceptance is the first 10 years, with the decline and gradual ending of citations during the 11th to 30th years...
It will take time and experimentation to evaluate to determine how applicable some of these ideas and findings may be to book retention decisions. The results need to be qualified: the sample size was small; it considers only article-to-book citations, not book-to-book citations, which may under-represent humanities citations. But the article provides an excellent foundation. A hearty thanks to Professor Tang and predecessors for providing this useful framework.

Wednesday, February 15, 2012

The Impact of Books

Effect of heavily-cited monograph
During a recent monographs deselection project, an astute librarian inquired whether a book's "impact factor" -- the number of times it has been cited in other books or journals -- might be invoked as a title protection rule. Impact factor, of course, is a concept much more highly developed for journals and conference proceedings than for monographs. Often described as a quantitative tool for evaluating journals, impact factor captures the frequency with which an article has been cited in a three-year period. At the journal title level, it captures the average number of citations per paper. Results are published annually in Journal Citation Reports. While not without controversy as a performance metric, impact factor is widely used as a shorthand indicator of article and journal quality.

Recently, an impact factor for books has begun to receive some overdue attention. In late 2011, Thomson Reuters introduced the Book Citation Index, available through its Web of Knowledge platform. Despite its bold taglines of "putting books back into the library" and "completing the research picture", it represents a fairly modest beginning. By December 2011, it was projected to include 30,000 titles, with a plan to add 10,000 per year. The Thomson Reuters site describes a careful selection process, and highlights improved discovery and citation navigation as the Index's primary attributes. But there is a clear implication that these are important monographs in their respective fields.

This implication is not without controversy. Metrics such as citation analysis raise the hackles of some researchers, especially in the humanities and social sciences, as shown in a lively exchange of comments following this article from Times Higher Education: "Monographs finally join citations database."  On October 13/14, 2011, a Mr Flannigan let it be known that:
"The field of citation counting isn't a 'field' in any intellectual sense. It's a shortcut; an attempt to evade engagement with intellectual content and reduce everything to the logic of a spreadsheet."
 "I don't doubt that some disciplines might benefit from citation counting. But I'm sick of scientists imposing their methods onto non-cognate disciplines and demanding that everyone else fall into line."
Several recent articles further explore book and even chapter-level impact using sources other than BCI. "Assessing the citation impact of books: the role of Google Books, Google Scholar, and Scopus", published in November 2011, examines whether these databases can provide "alternative sources of citation evidence", and specifically looks at references to and from books. Planned data mining of the Hathi Trust corpus may open up some new avenues. A 2006 account of a pilot project for the Australian Council for the Humanities, Arts, and Social Sciences tests the extension of citation analysis to books in history and political science:

Source:Linda Butler, Council for the Humanities, Arts, & Social Sciences
We'll follow up on these and other recent works on "bibliometrics" in a subsequent post. (Mark your calendars for that!) For now, let's assume that book impact factors are worth some consideration in decisions about storage, withdrawal, and retention.

As monographs are considered for deselection, there is often a desire to exempt titles that appear on "authoritative" lists or core lists, regardless of whether those titles have been used. Examples include titles listed in Resources for College Libraries or as CHOICE Outstanding Academic Titles, or on discipline-specific accreditation lists. Clearly, titles listed in the Book Citation Index could fall into this category, and might be considered candidates for retention irrespective of other considerations, even as the debate about citation analysis continues.

There is one very practical problem, however. Book Citation Index, as currently constituted, is limited to books with copyright dates in the current year plus 5 previous years in the Sciences, and current year plus 7 previous years in Social Sciences and Humanities. As this is written in early 2012, then, coverage includes:
  • Sciences: books published in 2007 or later
  • Social Sciences & Humanities: books published in 2005 or later
To date, deselection criteria in the projects supported by our firm Sustainable Collection Services have focused on titles published or acquired before 2005--sometimes much earlier. The universe of titles being considered for withdrawal and the universe of most-cited titles in Book Citation Index at present do not overlap at all. For now, impact factor simply cannot play a role in deselection decisions. The relevant data does not yet exist in any consolidated form.

As the list of titles grows over time, it will become more relevant. But the role of book impact factor in deselection will emerge only as titles published in 2005 and later begin to appear on withdrawal candidate lists. The utility of the impact factor will grow incrementally; under the Book Citation Index model, 10,000 additional titles will be available for analysis each year. In five or ten years, this may be an important data point. But not quite yet. In fact, it may not be necessary at all, since presumably highly-cited books would tend to receive more use. And in deselection decisions, use trumps most other considerations.