For excellent reasons investigators are being held accountable for their use of public funds and their research productivity. This may be a good thing, but we have not yet got it right.
My doubts about evaluation as currently practiced are not intended to question the need for accountability in research. Evaluation methodologists working in research analysis often suggest that criticism does equate to irresponsibility. In an era when accountability is needed and demanded, this pretense serves as protection for short-sighted evaluation.
Let me illustrate with a story before elaborating further on this theme. Ten years ago, we embarked on a series of epidemiological studies on the hazards and health risks of firefighting that established our group's reputation as serious investigators. This is an issue of intrinsic importance, in cost, suffering, and need for prevention.
The study was difficult to do, overcame a number of obstacles, and introduced several innovations in analysis. There were also many interesting scientific aspects that made the problem a non-trivial exercise in methodology.
In the end, the line of investigation was a huge success. However, while we were engaged in it we were continually under siege.
Along the way we were evaluated at many levels and found wanting. The obvious drawback of peer review -- that qualified reviewers are almost always competitors -- was never so apparent in my experience.
Our failure to keep to an arbitrary timetable was construed as poor management, despite documentation that delays were beyond our control. Our study was considered a "negative" in the agency's own performance audit in the short term because no product was visible for a year or so for them to point to.
We were denounced by an MLA in the provincial legislature as an example of waste.
Our department chair at the time, an evaluation methodologist himself, evaluated our performance on a regular basis but ignored all activity related to the study until publication of the final product.
A major review of population health research in the province, conducted on behalf of a large biomedical research foundation, omitted recognition of our study because its content was not considered to be "health" research by their peculiar definition.
An audit of our department by our university overlooked our work entirely because they were concentrating on health services administration.
The federal agency on which we depended for key data evaluated our proposed work themselves, on the basis of which they took our money, assigned us a low priority, and delayed our work by a year, drastically increasing our costs.
In the end, the funding agency thought so little of the work that they left us out of a publication featuring their successful projects.
From our point of view, and that of our colleagues, the project was an unequivocal success. Scientifically, it was solid work and was well received as a contribution to both epidemiology and occupational health.
Our findings replicated but also extended and refined the work of groups much larger and better supported than ours. It is now regularly cited in the literature and in adjudication of claims in workers' compensation cases involving firefighters, the practical application of this work.
In the end, our study proved to be highly influential, heavily cited and durable. But that is not the essential point of this essay.
The benefits of actually doing the work, quite unrelated to its scientific merit, were remarkable but were completely overlooked in evaluation. It proved our capabilities and launched us into the big leagues of our field. About a dozen people, students and staff, gained important research experience which they later carried into other jobs or used in graduate study; one of the students won a research award. The database has since been used for three unrelated studies and a set of graduate student exercises. An important confirmatory follow-up study based on our work was conducted by a U.S. federal agency, further enhancing our credibility and reputation.
None of these "spin-off" activities were funded by the original grant. None were recorded in any evaluation of our research impact. None are easy to document as a direct product of the original grant. However, the impact they had was a key factor in our later success and a benefit to our institution and the research community.
Perhaps a superb evaluation methodologist would have captured some fraction of these benefits, but it is hard to see how. We have just been lucky that our work was recognized in the end.
Citation analysis could have missed the impact of our work completely because only academics cite other academics in the academic literature and tend to do so soon after the paper appears. (Most papers are never cited, in fact.) Our competitors would have no incentive to cite us, once they published their own study.
The greatest impact of this work has been in a world (of insurance, workers' compensation, and fire department affairs) where citation and even documentation are inconsistent and uncommon. Citation analysis may work well in basic science but in applied fields -- precisely where government-funding sources are now likely to invest -- it is a very poor measure.
Cost-benefit analysis would have been nearly impossible to demonstrate because the hundreds of millions of dollars affected by decisions regarding benefits to firefighters and their families are distributed over many jurisdictions and tied up for years in adjudication, appeals and litigation. How do we place a value on the impact on families, or the responsibility of society to members of a heroic public service that keeps them safe from harm?
It is especially hard to see how the impact could have been assessed within a reasonable time after the award was made. Of course, the person doing the evaluation usually has no real insight into the quality of the work or the value of the study. Indeed, people familiar with the content are often viewed as biased when they act as evaluators.
Quality is so difficult to measure that evaluation methodologists evade the issue and substitute deeply flawed surrogate measures such as peer review and indicators of influence on other investigators.
Lacking the ability to measure value in content, evaluation methodology falls back on either shorter-term outcomes (publications) or process criteria. Everyone deprecates the simple counting of publications, then does it anyway or conducts the equivalent in choosing other indicators of quantity.
Evaluation methodology for research seems to have grown out of educational psychology and the psychometrics of student performance. I believe it works well in that area, where group (as opposed to team) performance matters, testing reflects a current body of knowledge, and the essential skills are the acquisition, retention and assimilation of existing knowledge. None of this is true for research.
Evaluation of research, as currently practiced, is based on the wrong premise. It assumes that each research project is separate and distinct and that the products of that research, whether publications, presentations, or patents, are complete in themselves. That has not been true for most serious research since the nineteenth century.
Serious basic research fits into a structure that is much larger, fluid, and depends for its vitality and self-correction on exactly those characteristics that funding agencies seem to abhor: duplication (replication), negative results (the flip side of verification), lack of focus (generalizability), intuition (inductive reasoning and hypothesis formation) and lack of relevance, which means moving beyond the narrow frame of reference (innovation).
Likewise, applied research that is truly innovative has unanticipated applications beyond the frame of reference of evaluation. The problem that a good study was originally designed to solve is just the beginning. If it is truly worthwhile it will lead to other levels of inquiry.
The value of a piece of serious research is seldom obvious within an administrative time frame because it cannot be evaluated definitively until the field is further advanced.
The fundamental fallacy is that no piece of research exists in isolation, and how it fits into the overall structure of knowledge has little to do with the research project and everything to do with that structure. How well the investigator understands the structure is what counts and what makes a piece of research a useful contribution or junk.
It has been my experience that young investigators who succeed in research careers have two essential characteristics: imagination and a capacity for analysis. By the latter I do not necessarily mean a command of the latest statistical package, but a logical approach to interpretation that balances an intuitive approach to the problem itself with a capacity to interpret the outcome in a meaningful context. Nobody measures this.
Grant applications capture these qualities poorly, although every once in a while they shine through if the applicant has not been overly drilled in the false skill of "grantsmanship."
This set of skills is a worthy topic for evaluation methodologists to investigate. Their professional ascendancy in granting agencies and the research community should be resisted until they show a mastery of this essential dimension of research.
Tee L. Guidotti, MD, MPH, FRCPC, CCBOM, MFOM is professor of occupational and environmental medicine and Killam Annual Professor 1996-1997 at the University of Alberta.
The views expressed are those of the author and not necessarily those of CAUT.