Paper abstractProper Model Selection with Significance TestJin Huang - University of Ottawa, CanadaCharles X. Ling - University of Western Ontario, Canada Harry Zhang - University of New Brunswick, Canada Stan Matwin - University of Ottawa, Canada Session: Classifier Evaluation Springer Link: http://dx.doi.org/10.1007/978-3-540-87479-9_53 Model selection is an important and ubiquitous task in machine learning. To select models with the best future classification performance measured by a {em goal metric}, an {em evaluation metric} is often used to select the best classification model among the competing ones. A common practice is to use the same goal and evaluation metric. However, in several recent studies, it is claimed that using an evaluation metric (such as AUC) other than the goal metric (such as accuracy) results in better selection of the correct models. In this paper, we point out a flaw in the experimental design of those studies, and propose an improved method to test the claim. Our extensive experiments show convincingly that only the goal metric itself can most reliably select the correct classification models. |