Sean Halle
05/19/2006 01:04 AM
Computer Engineering
This paper introduces a run-time system that is scalable, efficient, and implements an intermediate format for hardware-independent parallel software. It is a peer-to-peer system, having no centralized functions, with peers organized into a tree-of-fully-connected-graphs. The coordination overhead (processing plus number of control messages) grows logarithmically with the number of processing nodes, allowing for scaling to millions of nodes. The run-time load-balances by dynamically dividing data and choosing the peer within a group (fully connected graph) which is best suited to process each piece.
The run-time system implements the computation model of a platform for parallel software called CodeTime. This computation model uses declarative scheduling constraints. Declarative constraints allow the run-time’s scheduler to dynamically choose the size of data for each unit of scheduled work during a run.
The choice-of-data-size per scheduling event during a run allows the same software, distributed in an intermediate format, to run efficiently on widely varying parallel hardware. Each hardware platform has a custom run-time that chooses sizes such that the computation-to-communication ratio is balanced to overlap communication while maintaining high processor utilization and low scheduler overhead.
Results are given for a run-time written, in Java, for a collection of workstations connected by a local-area network. A test problem was executed on a variety of configurations, and results indicate that scheduling overhead remained a small percentage of total computation time, control overhead increased as the log of the number of processors, and that utilization remained high across configurations.