UCSC-SOE-09-06: DKU Pattern for Performance Portable Parallel Software

Authors Sean Halle

Published On 02/12/2009 09:00 AM

Department Computer Engineering

Abstract The shift to an ever increasing number of cores on a chip is driving the need for parallel programming methods that allow "single source, multiple hardware, high performance on each". The difficulty of designing such methods is exemplified in the embedded industry. Here, the tradition is hand-coding for ultra-high performance on specialized architectures, so the single source must be automatically transformed to give performance comparable to hand-coding. However, it is desired that source have no knowledge of the MPSoC; it must be written in a generic way, providing the information that an automated process will use to make the program efficient on a particular chip. To achieve this, the programmer must state, in essence, how to change the size of a scheduled unit of work so that automated task-size tuning can take place. We propose a programming pattern, for the case of data-parallelism, to help provide that information, called DKU, which is short for "Divider-Kernel-Undivider". In this construct, the programmer writes three separate pieces of code: code that divides the iteration space plus data-structure into pieces; code that computes the answer for one piece; and code that puts the individual answers together into the larger answer. Because the programmer provides this code, the code can work with any data-structure, leaving the programmer free to choose structures natural to the problem. DKU has been implemented as part of the Open Media Platform project where it is added to Java and supported by a web infrastructure that specializes the single source to multiple hardware platforms. A client device requests a program and automatically receives the executable specialized to that clientâ€™s hardware.

Download

UCSC-SOE-09-06