ROC Analysis and the ROC Convex Hull


In our work in data mining and machine learning, we often face domains in which class distributions are greatly skewed and/or classification error costs are unequal. In these situations, evaluating classifiers is very difficult because classification accuracy, the metric by which most evaluation is currently done, is completely inadequate. To make things worse, class distributions in these domains often drift over time, and error costs may be known only approximately.

We've developed a robust framework for evaluating learned classifiers, based on ROC analysis, which enables us to analyze and visualize classification performance separately from assumptions about class distributions and error costs. The method computes the ROC convex hull and allows us to:

  • analyze classifier performance over a broad range of performance conditions (error costs and target class distributions),
  • determine the range of conditions under which a given classifier will be best, and
  • determine easily the best available classifier(s) for any particular conditions.
We now use this method extensively in our applied work as well as in our research on classification, and other researchers have begun using it as well.

Tutorial Paper

I've written an introductory paper, ROC Graphs: Notes and Practical Considerations for Researchers, that serves both as a tutorial introduction to ROC graphs and as a practical guide for using them in research.  This paper was later edited down and published in Pattern Recognition Letters under the title An Introduction to ROC Analysis.

The programs in this introductory paper (the Pattern Recognition Letters paper) are available in a tar package called ROC_algs.tar.gz.

Bibliography/Notebook of ROC Work

I maintain a bibliography/notebook of ROC work for my own research.  I use an open source package called Zotero, which is an add-on to the Firefox web browser. Zotero allows you to take notes, add bibliographic entries, remember papers, add keywords, etc., as you're browsing research papers and websites.  It also allows you to export a complete notebook or a bibliography file in many formats.  I highly recommend it.  My Zotero notebook is here -- it can be imported directly into your Zotero library -- and a BibTex database created from it is here.  

If you modify or extend the notebook, please send me any changes.  Eventually Zotero will have the ability to share notebooks on the web, at which point I'll upload it to the Zotero servers and make it shareable.

Talks

Here is an on-line talk (Powerpoint) on the ROC Convex Hull based on a presentation I gave at HP Labs. The slides are taken from various talks by Foster and me, and include a few from Foster's data mining evaluation tutorials. I've tried to get the web version of the talk to look like the real slides, but some of the characters are screwed up (notably, double quotes are question marks in some places).

Rob Holte also has a talk on ROC Convex Hull for Comparing Machine Learning Schemes.


Programs


Papers

These papers describe the ROC Convex Hull, its uses and its properties. The first paper listed is the most complete. The best basic introduction is probably the KDD-97 paper.

Robust Classification for Imprecise Environments
Foster Provost and Tom Fawcett.
Machine Learning Journal, vol. 42, no. 3. March 2001.

Abstract: In real-world environments it usually is difficult to specify target operating conditions precisely, for example, target misclassification costs. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. In some cases, the performance of the hybrid actually can surpass that of the best known classifier. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. The hybrid also is efficient to build, to store, and to update. The hybrid is based on a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull (ROCCH) method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Finally, we point to empirical evidence that a robust hybrid classifier indeed is needed for many real-world problems.

Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
Foster Provost and Tom Fawcett. Presented at KDD-97 (Third International Conference on Knowledge Discovery and Data Mining)

Postscript version, PDF version.

Abstract: Applications of inductive learning algorithms to real-world data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses.

The Case Against Accuracy Estimation for Comparing Induction Algorithms
Foster Provost, Tom Fawcett and Ron Kohavi. Presented at ICML-98 (Fifteenth International Conference on Machine Learning), July 1998.

Postscript version, PDF version.

(Though this paper is not about the ROC convex hull, it points out problems with using accuracy for evaluating classifiers and it motivates the need for ROC analysis.)

Abstract: We analyze critically the use of classification accuracy to compare classifiers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracy for comparing classifiers and draw into question the conclusions that can be drawn from such studies. In the course of the presentation, we describe and demonstrate what we believe to be the proper use of ROC analysis for comparative studies in machine learning research. We argue that this methodology is preferable both for making practical choices and for drawing scientific conclusions.

Robust Classification Systems for Imprecise Environments
Foster Provost and Tom Fawcett. Presented at AAAI-98 (Fifteenth National Conference on Artificial Intelligence), July 1998.

Postscript version, PDF version.

Abstract: In real-world environments, it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. In some cases, the performance of the hybrid can actually surpass that of the best known classifier. The hybrid is also efficient to build, to store, and to update. Finally, we provide empirical evidence that a robust hybrid classifier is needed for many real-world problems.


Related Work

Other people's work building upon, or relevant to, the ROC Convex Hull.

Better Decisions through Science
John Swets, Robyn Dawes and John Monahan
Scientific American, October 2000, pp.82-87.
Every day, important and complex yes-or-no diagnostic decisions are made throughout medicine, industry and society. Statistical methods of making those choices could dramatically improve the outcomes.
A fairly well-written, accessible introduction to ROC curves and their use in decision making.

Note on the Location of Optimal Classifiers in ROC Space
Ashwin Srinivasan. Oxford University Technical Report PRG-TR-2-99.

Abstract: The comparative study of classifier performance is a worthwhile concern in Machine Learning. Empirical comparisons typically examine unbiased estimates of predictive accuracy of different algorithms -- the assumption being that the classifier with the highest accuracy would be the ``optimal'' choice of classifier for the problem. The qualification on optimality is needed here, as choice is restricted to the classifiers being compared, and the estimates are typically subject to sampling errors. Comparisons based on predictive accuracy overlook two important practical concerns, namely (a) class distributions cannot be specified precisely. Distribution of classes in the training set are thus rarely matched exactly on new data; and (b) that the costs of different types of errors may be unequal. Using techniques developed in signal detection, Provost and Fawcett describe an elegant method for the comparative assessment of binary classifiers that takes these considerations into account. Their principal result is that optimal classifiers lie on the edge of the convex hull of points representing the classifier performance in a Lorentz diagram. The authors leave open the question of whether this result extends to classifiers that discriminate between more than two classes. Here we show that the result that optimal classfiers are located on the edge of the convex hull does extend to arbitrary number of classes. This follows directly from results about convex sets that form the basis of linear programming algorithms.

An Assessment of ILP-assisted Models for Toxicology and the PTE-3 Experiment
Ashwin Srinivasan, Ross King and Douglas Bristol.
In Proceedings of the Ninth International Workshop on Inductive Logic Programming (ILP99).
(The authors use the ROCCH to evaluate ILP models for toxicology prediction.)

Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria
Chris Drummond and Robert C. Holte.
Presented at ICML-2000.

Abstract: This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are influenced by misclassification costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as or better than, in terms of expected misclassification cost, splitting criteria that are cost sensitive. Consequently there are two opposite ways of dealing with imbalance. One is to combine a cost insensitive splitting criterion with a cost insensitive pruning method to produce a decision tree algorithm little affected by cost or prior class distribution. The other is to grow a cost-independent tree which is then pruned in a cost-sensitive manner.

Explicitly Representing Expected Cost: An Alternative to ROC Representation
Chris Drummond and Robert C. Holte.
School of Information Technology and Engineering, University of Ottawa. Presented at KDD-2000.

Abstract: This paper proposes an alternative to ROC representation, in which the expected cost of a classifier is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the range of costs and class frequencies where a particular classifier is the best and quantitatively how much better it is than other classifiers. This paper demonstrates there is a point/line duality between the two representations. A point in ROC space representing a classifier becomes a line segment spanning the full range of costs and class frequencies. This duality produces equivalent operations in the two spaces, allowing techniques used in ROC analysis to be readily reproduced in the cost space.

ROC Performance Evaluation of Web-based Bibliographic Navigator using Mining Association Rules
M. Kawahara and H. Kawano
Lecture Notes in Computer Science 1749, Springer, 5th International Computer Science Conference (ICSC'99),

Abstract: Abstract. We have been developing web-based intelligent navigation system, "mondou", using mining association rules. It is very helpful for search users to provide associative keywords which are derived by mining algorithm. At present, based on the experimental results of mining algorithms, we try to extend the search functions of the commercial text database "Open Text". However, in order to derive appropriate keywords with small computing cost, we have to carefully determine several system parameters, such as Minsup and Minconf threshold values. In this paper, we focus on the techniques of ROC graph to evaluate the characteristics of derived rules. We propose the ROC analytical model of our search system, and evaluate the performance of extended association rules. Finally, by using the ROC convex hull method, we try to specify the optimal threshold values to derive effective rules from the typical bibliographic data, INSPEC database.

Comparing Classifiers when the Misallocation Costs are Uncertain
N.M. Adams and D.J. Hand. Pattern Recognition, Vol. 32 (7) (1999) pp. 1139-1147

Kelly Zou maintains an impressive bibliography of ROC papers and articles, mostly from the medical diagnostic literature.


Tom Fawcett (tom dot fawcett at gmail dot com)

Back to Tom Fawcett's homepage.
Modified: 3-Dec-2011