CHAPTER 2
SOFTWARE DEBUGGING
Software debugging is one of the major tasks of programming. With the advent of concurrent programming, another degree of complexity was added to the already difficult debugging process. This chapter presents the basic concepts inherent in software debugging as well as a list of errors related to concurrency. An introduction to the basic terminology related to debugging and concurrency follows.
When the results of a program are examined, errors may be found through the discovery of their effects. Debugging is the activity of finding the causes that produced undesirable program effects. Two basic techniques are used to debug a program. In one technique, called tracing, a program can be instrumented to produce a trace. In tracing, a program runs to completion without external intervention and information about several selected states of the program are saved. After a program is executed, this information is studied to find the causes of a program's malfunction. A program malfunction is called a bug.
In the other technique, controlled execution, a program is instrumented to pause its execution at breakpoints. During the execution of such program, every time a breakpoint is reached, the program halts allowing the user to study the state of the program. A faulty program is executed and debugged through many cycles until the fault is localized and fixed. Concurrent programs pose a new problem in debugging. According to LeBlank [LeBl87],
Debugging sequential programs is a well-understood task that draws on tools and techniques developed over many years . . . Debugging parallel programs is considerably more difficult because parallel programs are often not deterministic.
LeBlank also states that
A major difference between sequential and concurrent programs is the definition of program state in each case. The state of a sequential program can be characterized simply by a program counter value and a memory image of its data. The state of a concurrent program requires such information to be considered for each of its processes.
According to Marinescu [Mari89], monitoring the execution application through some form of instrumentation can provide the programmer with insight into the interaction between the system and the application. Software and/or hardware sensors are used to probe the program's activity. This information is then passed to the user for detecting and correcting a faulty program.
Intrusive monitors may affect the order in which events take place in a concurrent program, thus affecting its outcome. Intrusive software may block the occurrence of an event that causes a program to produce an incorrect response. An intrusive monitor is commonly implemented in the user's program through a pre-processor, as in MONGRAF [Male88]. Other intrusive monitors are implemented in the user's program by inserting manually monitoring calls into strategic locations of the program source code. One example can be found in Moran's work [Mora85]. The pre-processor replaces the human intervention of inserting the monitoring calls. In both cases, the source program is modified and debugging must be performed with the monitoring program image which, to some degree, changes the form in which the original program is written and subsequently executed.
A non-intrusive monitor does not in any way affect the order in which events take place in a concurrent program. Non-intrusive monitors are usually hardware-implemented. The MAD system (Monitoring, Animating, Debugging) is an example of a non-intrusive monitor. According to Rubin et al. [Rubi89],
Non-intrusiveness is crucial because parallel programs are notoriously dependent on the relative execution speed of their components. A program may have a radically different performance under an intrusive animation or monitoring system.
Low-level debugging, also called source-level debugging, concentrates on errors that are expressed by source language statements. Such debugging allows the examination of post-mortem dumps, the use of trace features, and interactive memory examination. For instance, DbxTool [Adam86] is a low-level debugger.
High-level debugging concentrates on outcomes that are only expressed in higher abstractions. High-level debugging is useful for debugging complex abstractions and large parallel programs [Aral89]. This type of debugger classifies the program space by its different elements (data and statements), providing the user with a more refined feedback. For instance, BELVEDERE [Houg89] is a high-level debugger.
2.8. Very High-Level Debugging
Very high-level debugging performs automatic bug diagnosis for logical and semantic errors. Expert systems are prepared to identify non-syntactic bugs, or ill-defined patterns of interprocess interactions, and, when possible, to guide the programmer in correcting the bugs. PROUST [John86] and ADAT [Lope91c] are examples of such debuggers. At their current stage of development these systems are useful in educational environments as novice programmer aids. Each system is discussed in Chapter Three and Chapter Four, respectively.
In static analysis, also called static data flow analysis, the source code is analyzed to obtain certain deductions about the program's control paths. This allows the detection of a significant class of programming errors. These errors are also referred to in the literature [Tayl80] as programming anomalies. Examples of anomalies detected by static analyses are: undeclared variables, dead variable declaration, and concurrent read/write operations over a shared variable within at least two processes. This last anomaly is further discussed at the end of this chapter and also in Chapter Four.
In dynamic analysis, a trace is usually used to record the behavior of the program while in execution. Dynamic analysis can be performed by using either software or hardware. The drawback of this technique is that nothing is recorded regarding program alternatives that were not executed [Cail87]. Dynamic analysis is particularly useful in the detection of deadness errors. Deadness errors are discussed below. Examples of monitors that perform dynamic analysis are MON [Luck85] and EDEN [Chen87]. These tools are discussed in Chapter Three.
Flowback analysis is used to provide information on causal relationships between events during the execution of a program, without the need to re-execute the program for debugging purposes [Mill88]. PPD [Mill88] is an example of a debugger that uses flowback analysis.
Semantic analysis is used to generate certain types of information about the program, for example, which variables may be updated when a routine is executed. This technique was originally used in compiler optimization. According to Aho and Ullman [Aho79],
The term semantic analysis is applied to the determination of the type of intermediate results, the check that arguments are of types that are legal for an application of an operator, and the determination of the operation denoted by the operator (e.g., + could denote fixed or floating add, perhaps logical "or", and possibly other operations as well). Semantic analysis can be done during the syntax analysis phase, the intermediate code generation phase, or the final code generation phase.
PPD [Mill88] is an example of a debugger that uses semantic analyses.
When a single debugger is used to work with more than one process, the problem of process resolution must be addressed. Process resolution consists of specifying to the debugger which process one wishes to debug.
In this section, several non-syntactic errors related to concurrency are discussed.
2.14.1. Task Sequencing Errors
Task sequencing errors are those errors in which unpredictable task interactions occur [Luck85]. Race conditions, discussed below, provoke task sequencing errors.
Race conditions occur when more than one task is updating a variable (or changing the state of a device) and the order of update is non-deterministic. A formal definition for the detection of race conditions is given by Miller and Choi [Mill88]:
... for two edges e1 and e2, e1 -> e2 is true if n1 -> n2 is true where n1 is the end node of e1 and n2 is the start node of e2. . . . Two edges e1 and e2 are simultaneous edges if ¬ (e1 -> e2) ^ ¬ (e2 -> e1). . . .
The READ_SET(ei) is the set of the shared variables read-accessed in the edge ei. The WRITE_SET(ei) is defined similarly. ...
We say two simultaneous edges e1 and e2 are race-free if all the following conditions are true: a) WRITE_SET(e1) & WRITE_SET(e2) = 0. b) WRITE_SET(e1) & READ_SET(e2) = 0. c) READ_SET(e1) & WRITE_SET(e2) = 0. In other words, two simultaneous edges e1 and e2 are race-free if there exists no read/write conflict or read/write conflict between them. ...
An execution instance of a program is said to be race-free if all pairs of the simultaneous edges of the execution instance are race free.
For an example, see Program Pgm2, discussed in Chapters Five and Six.
The initialization anomaly and the update anomaly discussed below are two types of task sequencing errors.
2.14.1.1.1. Initialization Anomaly
An initialization anomaly may occur when global objects are initialized at declaration time and a set of tasks perform read operations over these objects concurrently with their elaboration. Program Pgm0, shown in Appendix C, is an example of such problem. The initialization anomaly is also discussed in Chapter Four.
An update anomaly may occur when a global object is used in read/write operations by more than one task without the enforcement of mutual exclusion and/or synchronization protocols. The occurrence of such anomaly may cause the storage of incorrect values. As an example, consider the case where two tasks, T1 and T2, both perform an increment operation onto a global object, X, specified by the statement
X := X + 1;
A pseudo assembler equivalence for the above statement could be expressed by the following three instructions:
(1) Load the contents of X onto register R1
(2) Add 1 to the contents of register R1
(3) Store the contents of register R1 onto X
Assume that the initial value of X is 0. In the case of a single task, the resulting value of X, after the above instructions are executed, can be assured to be 1. However, in the case of two tasks, where the above instructions are executed in an interleaved fashion, the resulting value of X may be either 1 or 2. The concurrent update of a global object typifies what is called a critical section [Deit84]. There are several techniques that can be used to correctly update such object through the implementation of mutual exclusion. This issue is also discussed in Chapter Four. Program Pgm2, shown in Appendix C, exemplifies the occurrence of the update anomaly.
According to Luckham and Helmbolt [Luck85], "Deadness is simply the inability of some or all of the active tasks to continue..."
When a deadness error occurs, a program is said to be in a deadlock state and is usually interrupted.
Defines the state of a program in which at least one process is not progressing due to the fact that it is waiting for the occurrence of an event that cannot happen.
2.14.2.2. Task Communication Deadlocks
According to Cheng et al. [Chen87],
A tasking communicating deadlock in an Ada tasking program is a situation where all members of a group of tasks will wait forever at communication points, and hence can never proceed with their computation...
Cheng et al. [Chen87] also discussed four types of communication deadlocks, discussed in the next section.
2.14.2.2.1. Self Blocking Anomaly
The following program exemplifies the occurrence of a self-blocking anomaly (deadlock). The string I am here again! will be output after a rendezvous with the Task Main and Task E is completed. The entry call E.C, issued within Task Main, works like a start button for Task E. After the Put_Line call is executed, Task E calls itself by executing the statement E.C. The Ada tasking model [DoD83] does not allow the use of recursion in tasks. It also requires that a task issuing an entry call be placed in the entry calling state, which causes the task's suspension. Task E is waiting for an event that will never occur! Tasks have their own properties and do not behave in the same way as procedures.
WITH Text_IO; USE Text_IO;
PROCEDURE S_Blocking IS
TASK E IS
ENTRY C;
END E;
TASK BODY E IS
BEGIN
ACCEPT C;
Put_Line("I am here again!");
E.C;
END E;
BEGIN
E.C;
END S_Blocking;
2.14.2.2.2. Circular Entry Call Anomaly
A circular entry call anomaly occurs when there is a chain of tasks in which each task issues an entry call to the next task in the chain. This problem is also known as circular wait [Deit84]. The following program exemplifies the occurrence of a circular entry call deadlock. Task A first issues an entry call to Task B. Task B first issues an entry call to Task A. That is all that is needed for a deadlock to occur. The last two statements within each task will never be executed. The circular entry call anomaly is further discussed in Chapter four.
WITH Text_IO; USE Text_IO;
PROCEDURE C_DeadLock IS
TASK A IS ENTRY E1; END A;
TASK B IS ENTRY E1; END B;
TASK BODY A IS
BEGIN
B.E1;
ACCEPT E1;
Put_Line("A is done.");
END A;
TASK BODY B IS
BEGIN
A.E1;
ACCEPT E1;
Put_Line("B is done.");
END B;
BEGIN
NULL;
END C_DeadLock;
2.14.2.2.3. Dependence Blocking Anomaly
According to Cheng et al. [Chen87],
The task T1 is (directly or indirectly) dependent on a block statement in the body of the task T2 or dependent on a subprogram called by T2, and T1 has issued an entry call to T2. When T2 proceeds into the execution of the block statement or the procedure call, the block statement or the procedure body will never terminate unless the dependent task T1 terminates. In this case, however, T1 will be blocked at the entry call to T2 which will never be accepted since T2 is at the block statement or the procedure call and will not execute the accept statement. Thus, T1 and T2 fall into a deadlock.
The next program exemplifies a dependance blocking occurrence.
WITH Text_IO; USE Text_IO;
PROCEDURE D_Deadlock IS
TASK T2 IS
ENTRY E;
END T2;
TASK BODY T2 IS
BEGIN
DECLARE
TASK T1;
TASK BODY T1 IS
BEGIN
T2.E;
Put_Line("T1 is done.");
END;
BEGIN
Put_Line("Block is done.");
END;
ACCEPT E;
Put_Line("T2 is done.");
END T2;
BEGIN
Put_Line("Main is done.");
END D_Deadlock;
Both body frames of Main and Block reach completion, but not termination. The output of the program is:
Main is done.
Block is done.
dead-lock: system inactive
which clearly indicates that the accept statement within T2 was never executed because T1 did not terminate.
2.14.2.2.4. Acceptance Blocking Anomaly
An acceptance blocking deadlock occurs when a task is waiting at an accept statement and no entry call is issued.
WITH Text_IO; USE Text_IO;
PROCEDURE A_Blocking IS
TASK Ack IS
ENTRY Service;
END Ack;
TASK BODY Ack IS
BEGIN
ACCEPT Service;
Put_Line("The service has been completed.");
END Ack;
BEGIN
NULL;
END A_Blocking;
A set of two or more tasks execute identical entry calls. The passive task, that is the task that accepts these entry calls, is not allowed to answer all the calls. The following program shows the basic frame in which the server anomaly is identified:
WITH Text_IO; USE Text_IO;
PROCEDURE Server_Anomaly IS
TASK TYPE Active;
TASK Passive IS
ENTRY E;
END Passive;
TASK BODY Active IS
BEGIN
Passive.E;
END Active;
TASK BODY Passive IS
BEGIN
ACCEPT E;
END Passive;
BEGIN
NULL;
END Server_Anomaly;
Program Pgm1, shown in Appendix C, also exemplifies such an anomaly, which results in a deadlock. This problem is discussed in Chapter Four.
The following chapter reviews many of the existing tools for debugging and monitoring concurrent programs.