UCSC-CRL-90-11: DEBUGGING PARALLEL PROGRAMS BY TRACE ANALYSIS

03/01/1990 09:00 AM
Computer Science
One of the fundamental problems encountered when debugging a parallel program is determining the race conditions in the program. A race condition may exist when two or more tasks access shared data in an unspecified order and at least one of them is executing a write operation. This paper discusses this problem and presents a tool for automatically detecting race conditions in parallel programs by analyzing program traces. In a parallel system, events can occur concurrently. However, programmers are often forced to rely on misleading sequential traces for information about their program\'s behavior. We view a program execution as a partial ordering of events, and define which executions are consistent with a given trace. Although it is generally not possible to determine which of the executions occurred, we define the notion of ``safe orderings\'\' which are guaranteed to occur in every execution that is consistent with the trace. We present a series of algorithms which extract many of the ``safe orderings\'\' from a sequential trace with anonymous semaphore-style synchronization. The initialization algorithm starts from a sequential trace and creates a partially ordered canonical execution. The rewinding algorithm strips away the ordering relationships particular to the canonical execution, so that the resulting partial order is safe. The expanding algorithm increases the amount of ordering information while maintaining a safe partial order. An algorithm is also presented to detect critical regions, to distinguish unordered sequential events from concurrent events. We have focused on event style communication and counting semaphore models because they are one of the most frequently used parallel models, especially in various parallel Fortran versions. The ideas and the algorithms should be applicable to other parallel systems. A trace analyzer has been implemented, and some experiments have been performed. The current implementation is used to analyze traces generated by IBM Parallel Fortran. The trace analyzer can report various data race conditions in parallel programs by finding unordered/concurrent events and variable access conflicts.

This report is not available for download at this time.