Key points are not available for this paper at this time.
A number of results have bounded generalization error of a classifier in terms of its margin on the training points. There has been some debate about whether the minimum margin is the best measure of the distribution of training set margin values with which to estimate the generalization error. Freund and Schapire 7 have shown how a different function of the margin distribution can be used to bound the number of mistakes of an on-line learning algorithm for a perceptron, as well as an expected error bound. Shawe-Taylor and Cristianini 131 showed that a slight generalization of their construction can be used to give a pat style bound on the tail of the distribution of the generalization errors that arise from a given sample size when using threshold linear classifiers. We show that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik [4. We generalise the basic result to function classes with bounded fat-shattering dimension and the Ii measure for slack variables which gives rise to Vapnik's box constraint algorithm. Finally, application to regression is considered, which includes standard least squares as a special case.
Shawe‐Taylor et al. (Tue,) studied this question.