Description
The goal of learning theory is to approximate a function from sample values. To attain this goal learning theory draws on a variety of diverse subjects, specifically statistics, approximation theory, and algorithmics. Ideas from all these areas blended to form a subject whose many successful applications have triggered a rapid growth during the last two decades. This is the first book to give a general overview of the theoretical foundations of the subject emphasizing the approximation theory, while still giving a balanced overview. It is based on courses taught by the authors, and is reasonably self-contained so will appeal to a broad spectrum of researchers in learning theory and adjacent fields. It will also serve as an introduction for graduate students and others entering the field, who wish to see how the problems raised in learning theory relate to other disciplines.
Chapter
1.5 The bias–variance problem
1.6 The remainder of this book
1.7 References and additional remarks
2 Basic hypothesis spaces
2.1 First examples of hypothesis space
2.3 Hypothesis spaces associated with Sobolev spaces
2.4 Reproducing Kernel Hilbert Spaces
2.6 Hypothesis spaces associated with an RKHS
2.8 On the computation of empirical target functions
2.9 References and additional remarks
3 Estimating the sample error
3.1 Exponential inequalities in probability
3.2 Uniform estimates on the defect
3.3 Estimating the sample error
3.4 Convex hypothesis spaces
3.5 References and additional remarks
4 Polynomial decay of the approximation error
4.2 Operators defined by a kernel
4.5 Characterizing the approximation error in RKHSs
4.7 References and additional remarks
5 Estimating covering numbers
5.2 Covering numbers for Sobolev smooth kernels
5.3 Covering numbers for analytic kernels
5.4 Lower bounds for covering numbers
5.5 On the smoothness of box spline kernels
5.6 References and additional remarks
6 Logarithmic decay of the approximation error
6.1 Polynomial decay of the approximation error for … kernels
6.2 Measuring the regularity of the kernel
6.3 Estimating the approximation error in RKHSs
6.5 References and additional remarks
7 On the bias–variance problem
7.3 A concrete example of bias–variance
7.4 References and additional remarks
8 Least squares regularization
8.1 Bounds for the regularized error
8.2 On the existence of target functions
8.3 A first estimate for the excess generalization error
8.6 Compactness and regularization
8.7 References and additional remarks
9 Support vector machines for classification
9.2 Regularized classifiers
9.3 Optimal hyperplanes: the separable case
9.4 Support vector machines
9.5 Optimal hyperplanes: the nonseparable case
9.6 Error analysis for separable measures
9.7 Weakly separable measures
9.8 References and additional remarks
10 General regularized classifiers
10.1 Bounding the misclassification error in terms of the generalization error
10.2 Projection and error decomposition
10.3 Bounds for the regularized error D(Gamma,Phi) of fGamma
10.4 Bounds for the sample error term involving fGamma
10.5 Bounds for the sample error term involving fPhiz,Gamma
10.6 Stronger error bounds
10.7 Improving learning rates by imposing noise conditions
10.8 References and additional remarks