UCSC-CRL-02-36: AN OPTIMAL ALGORITHM FOR ONLINE REORGANIZATION OF REPLICATED DATA

11/25/2002 09:00 AM
Computer Science
As storage systems scale to thousands of disks, data distribution and load balancing become increasingly important. We present an algorithm for allocating data objects to disks as a system as it grows from a few disks to hundreds or thousands. A client using our algorithm can locate a data object in microseconds without consulting a central server or maintaining a full mapping of objects or buckets to disks. Despite requiring little global configuration data, our algorithm is probabilistically optimal in both distributing data evenly and minimizing data movement when new storage is added to the system. Moreover, our algorithm support weighted allocation and variable levels of object replication, both of which are needed to efficiently support growing systems while technology changes.

UCSC-CRL-02-36