Paper abstract

Proper Model Selection with Significance Test

Jin Huang - University of Ottawa, Canada
Charles X. Ling - University of Western Ontario, Canada
Harry Zhang - University of New Brunswick, Canada
Stan Matwin - University of Ottawa, Canada

Session: Classifier Evaluation
Springer Link:

Model selection is an important and ubiquitous task in machine learning. To select models with the best future classification performance measured by a {em goal metric}, an {em evaluation metric} is often used to select the best classification model among the competing ones. A common practice is to use the same goal and evaluation metric. However, in several recent studies, it is claimed that using an evaluation metric (such as AUC) other than the goal metric (such as accuracy) results in better selection of the correct models. In this paper, we point out a flaw in the experimental design of those studies, and propose an improved method to test the claim. Our extensive experiments show convincingly that only the goal metric itself can most reliably select the correct classification models.