UCSC-CRL-90-45: PROXIMITY CONTENT-ADDRESSABLE MEMORY: AN EFFICIENT EXTENSION TO k-NEAREST NEIGHBORS SEARCH

09/01/1990 09:00 AM
Computer Engineering
This thesis presents the development of *proximity content addressable memory (PCAM)*, an extension to nearest neighbors search for high dimensional parameter spaces with variable weighting. In PCAM, the parameter space is mapped to Boolean space by a family of matching and magnitude comparison functions. A weighted distance is computed. Three methods were developed, the first two being software algorithms for conventional processors, and the third a hardware implementation. The first algorithm preprocesses the query into a look-up table, whereas the second preprocesses the data points into an inverted index. The latter is preferable when indexing is sparse or the queries contain few attributes. The hardware implementation represents several extensions to prior *content addressable memory (CAM)* systems and uses a modified tree architecture. A survey was unable to identify any software algorithms asymptotically better than exhaustive search for high-dimensional spaces with variable distance weighting. All the PCAM implementations have time and space linear in the number of data points and dimension of the parameter space, excluding read out. The PCAM algorithms equal the performance of the best k- nearest neighbors algorithms surveyed. For an example database, a 10 to 100 times speed up over linear search is achieved. A hardware system with fewer than 100 processing elements is projected to provide one to two orders of magnitude acceleration for personal computers and workstations. Systems with thousands of processing elements appear feasible. If one hardware processing element is available for each data point, time O(1) and space O(N) can be achieved, excluding read out time. A CMOS VLSI chip containing 8 processing elements and the associated response resolver network has recently been completed. Applications where PCAM may provide constant factor or asymptotically better space-time performance include information retrieval based on similarity, computer aided VLSI and circuit board design, and pattern matching and classfication in the presence of noisy, erroneous, or incomplete data. Efficient recall based on content and similarity may also prove important in the development of intelligent systems.

This report is not available for download at this time.