12/01/1990 09:00 AM
Computer Engineering
We describe a generalization of the PAC learning model that is based on statistical decision theory. In this model the learner receives randomly drawn examples, each example consisting of an instance x in X and an outcome y in Y , and tries to find a hypothesis h : X --> A , where h in H , that specifies the appropriate action a in A to take for each instance x , in order to minimize the expectation of a loss l(y,a). Here X, Y, and A are arbitrary sets, l is a real-valued function, and examples are generated according to an arbitrary joint distribution on X times Y . Special cases include the problem of learning a function from X into Y , the problem of learning the conditional probability distribution on Y given X (regression), and the problem of learning a distribution on X (density estimation). We give theorems on the uniform convergence of empirical loss estimates to true expected loss rates for certain hypothesis spaces H , and show how this implies learnability with bounded sample size, disregarding computational complexity. As an application, we give distribution-independent upper bounds on the sample size needed for learning with feedforward neural networks. Our theorems use a generalized notion of VC dimension that applies to classes of real-valued functions, adapted from Pollard\'s work, and a notion of *capacity* and *metric dimension* for classes of functions that map into a bounded metric space.