Paper abstract

Learning with $L_{q<1}$ vs $L_1$-norm regularisation with exponentially many irrelevant features

Ata Kaban - University of Birmingham, UK
Robert Durrant - University of Birmingham, UK

Session: Regression
Springer Link: http://dx.doi.org/10.1007/978-3-540-87479-9_56

We study the use of fractional norms for regularisation in supervised learning from high dimensional data, in conditions of a large number of irrelevant features, focusing on logistic regression. We develop a variational method for parameter estimation, and show an equivalence between two approximations recently proposed in the statistics literature. Building on previous work by A.Ng, we show the fractional norm regularised logistic regression enjoys a sample complexity that grows logarithmically with the data dimensions and polynomially with the number of relevant dimensions. In addition, extensive empirical testing indicates that fractional-norm regularisation is more suitable than L1 in cases when the number of relevant features is very small, and works very well despite a large number of irrelevant features.