Paper abstractA Novel Scalable and Data Efficient Feature Subset Selection AlgorithmSergio Rodrigues de Morais - INSA-Lyon, FranceAlex Aussem - Universite de Lyon 1, France Session: Feature Selection Springer Link: http://dx.doi.org/10.1007/978-3-540-87481-2_20 In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundary of the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulness condition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables. |