UCSC-CRL-03-01: Storage Embedded Networks (SEN) and Adaptive Caching using Multiple Experts (ACME)

06/02/2003 09:00 AM
Computer Science
The gap between CPU speeds and the speed of the technologies providing the data is increasing. This causes the performance of processes to be limited by the performance of the storage devices, the networks and the buses. Furthermore, the number of CPUs that share these data access resources is growing exponentially. Caching, prefetching and parallelism are some of the techniques used today to cope with I/O latency and system scalability to support more users. This paper describes the two major contributions of our ongoing research on distributed data access. The first contribution is the design of the Storage Embedded Networks (SEN) architecture that aims to improve user response times and scalability on the Internet by better distribution of caches. SEN architecture is composed of trusted routers embedded with volatile and non volatile storage that snoop bypassing objects for caching. Requests are checked by every hop, thus ensuring the transmission of the closest copy on the data path and load reduction at the upstream. The two main control overheads of other architectures, connection establishment and continuous cache communications, do not exist in SEN. The second contribution is the design of adaptive caching schemes using multiple experts, called ACME, that manage the SEN caches and further improve the hit rates over static caching techniques. Machine learning algorithms are used to rate and select the current best policies or mixtures of policies via weight updates based on their recent success. Each adaptive cache node can tune itself based on the workload it observes. Since no cache databases or synchronization messages are exchanged for adaptivity, the clusters composed of these nodes will be exceedingly scalable and manageable. We propose to extend our preliminary designs and analysis in two directions. The first is to compare the Storage Embedded Networks (SEN) with the existing hierarchical and distributed cache clusters in terms of user response times, network bandwidth usage, server load reductions and scalability. For this part we will run large scale simulations over realistic topologies using real web proxy, file system and raw disk traces. We will start these comparisons with static caching techniques. In the second part we will introduce the adaptive caching techniques to eliminate manual tuning and manual topological placement of static caches. We will measure the performance improvements gained by using the adaptive techniques and probe the performance limits by theoretical optimal algorithms. We will also quantify the time and space complexities of our schemes. Real system implementations will help us optimize our designs.