O. Catoni (ENS, France)
Unsupervised clustering in reproducing kernel Hilbert spaces
One great challenge in the analysis and interpretation of high dimensional signals (such as digital images) is to derive invariant classification algorithms. Even the most simple transformations, like translations, when applied to a single signal instance, will generate a set of signals spanning a high dimension vector space, a feature that causes serious troubles to most statistical inference approaches. In this talk, we will propose to combine two tools : the use of reproducing kernel Hilbert spaces, that tend to cast the representation of data into a space of higher (or even potentially infinite) dimension, and a label aggregation scheme, where labels are the principal components of the data set, that reduces the dimension of the representation. In order to make this scheme operational, we need to perform PCA in high (or even infinite) dimension. We will introduce to this purpose dimension free PAC-Bayes bounds for PCA that are the pendent of PAC-Bayes margin bounds for Support Vector Machines. These bounds show that it is possible to obtain stable results while applying our clustering algorithm to an i.i.d. sample of signals drawn according to an unknown probability distribution.
29.11.2012 | |