UCSC-CRL-93-13: SAMPLE COMPRESSION, LEARNABILITY, AND THE VAPNIK-CHERVONENKIS DIMENSION

03/01/1993 09:00 AM
Computer Science
Within the framework of pac-learning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A concept is a subset of some instance domain X and a concept class a set of concepts. A sample compression scheme of size d for a concept class C consists of a compression function and a reconstruction function. The compression function, given a finite sample set consistent with some concept in C, chooses a subset of k examples as the compression set. The reconstruction function, given a compression set of k examples, reconstructs a hypothesis on X. Given a compression set produced by the compression function from a sample of a concept in C, the reconstruction function must be able to reproduce a hypothesis consistent with that sample. We demonstrate that the existence of a fixed-size sample compression scheme for a class C is sufficient to ensure that the class C is learnable. We define maximum and maximal classes of VC dimension d. For every maximum class of VC dimension d, there is a sample compression scheme of size d, and for sufficiently-large maximum classes there is no sample compression scheme of size less than d. We discuss briefly classes of VC dimension d that applies to some maximal and nonmaximum classes. It is unknown whether there is a sample compression scheme of size d for every class of VC dimension d.

UCSC-CRL-93-13