With the exception of the 1996 update note below, this article is reprinted from the Proceedings of TRI-Ada '92, Orlando, FL, November 16-20, Association for Computing Machinery (ACM), New York, 1992.

Ada Outperforms Assembly: A Case Study


Patricia K. Lawlis, PhD
Air Force Institute of Technology
Department of Electrical and Computer Engineering
Wright-Patterson AFB, OH 45433-6583
(513) 255-6027


Terence W. Elam, PE, CMfgE
Defense Contracts Management Area Operations
DCMDM-GYE
1001 Hamilton St.
Dayton, OH 45444-5300
(513) 296-8421

Update: July 1996

HTML rendering by Michael Feldman, The George Washington University.

Pat Lawlis retired from the Air Force in 1995. She is currently President of a small, women-owned software engineering instruction and consulting company.

c.j. kemp systems, inc.
P.O. Box 586
Fairborn, OH 45324-0586
(937) 878-3303 voice and fax (Dayton office)
(602) 460-7399 voice and fax (Phoenix office)
lawlis@aol.com

Sadly, Terry Elam passed away in 1993.

Abstract

With the intent of getting an Ada waiver, a defense contractor wrote a portion of its software in Ada to prove that Ada could not produce real-time code. The expectation was that the resultant machine code would be too large and too slow to be effective for a communications application. However, the opposite was verified. With only minor source code variations, one version of the compiled Ada code was much smaller while executing at approximately the same speed, and a second version was approximately the same size but much faster than the corresponding assembly code.

1. The Assumption

Based on information received from another contractor, as well as several articles written on the subject, QRS (not the company's real name), a defense contractor, had taken the position that assembly language was the only appropriate language for its application. This was a small communications application, approximately 2000 lines of assembly code, originally targeted for the TI 320C15 digital signal processor (DSP) chip.

The Department of Defense (DOD) considered this application to fall under the category of mission critical computer resources (MCCR), and MCCR applications are required to be written in Ada [2]. However, the data compiled by QRS indicated that compiled Ada produced object code which was slower and consumed 3.5 times as much ROM as assembled code. The government program manager and his support groups could not disprove the QRS findings. Therefore, even though the contract specified the use of Ada, the program manager gave permission to start the application using assembly instead of Ada.

QRS completed an acceptable assembly version of the software in approximately 18 months. The developers started by completing the Software Requirements Specification. Then they used a structured approach in completing a Software Design Document. Finally, 2000 lines of code, the entire application, was written in assembly. At first it was not fast enough, so it had to be "tweaked" to accomplish the required performance.

2. A Proof Was Required

One of the authors, MCCR focal point and program integrator for the Dayton area, had the job of reviewing DOD software development contracts to ensure current regulations were being followed. As soon as he found that QRS was not using Ada for this application, he started to investigate the reason. His main objective was to determine if Ada code could be developed for this application which would satisfy the system requirements--occupy no more than 2K of ROM and operate in a 500 microsecond time window. He spoke with experts with various backgrounds in widely diverse organizations--the chip manufacturer, an Ada vendor, an independent consultant, a government laboratory, and the Air Force's graduate school, the Air Force Institute of Technology.

The program integrator determined from the experts that Ada could not be categorically ruled out for this application. Hence, he indicated to the government program manager and QRS that the contract either had to use Ada or a waiver had to be obtained. So QRS tried to obtain a waiver using the findings of its initial investigation into Ada suitability as justification. But without actual benchmarking, this justification was deemed inadequate. QRS then decided to rewrite a portion of the software in Ada to prove that the waiver was justified, because its programmers truly believed Ada could not do the job.

3. The Proof Was Developed

To get its proof, QRS had one developer rewrite approximately twenty percent of the application, one Computer Software Component (CSC), in Ada. This developer had very limited Ada experience, having written only one Ada program, consisting of approximately 5000 lines of code, while working for a previous employer.

With direct access to the original experienced assembly developer, in only 16 labor hours the Ada developer reviewed an Ada text book, reviewed the original software design, and then wrote the first version of this critical CSC in Ada. The Ada consisted of five pages of code and comments, approximately 100 total lines of executable Ada. It was not a translation of the assembly code, but was written based on the original design document.

The first attempt at compiling this CSC, using the Meridian AdaVantage compiler, version 4.1, on an IBM PC-DOS system, resulted in one syntax error. After the syntax error was corrected, the new Ada CSC compiled successfully. All this was accomplished in only two days!

4. The Proof Backfired

The next step was to test the performance of this critical portion of the software. The original assembly code was written for the TI 320C15 DSP chip, but no Ada compiler was targeted to this chip. However, Tartan, Inc. had an Ada compiler targeted to the closely related TI 320C30 DSP chip, and this chip was chosen for the Ada application. Tartan agreed to test the newly developed Ada software at the Tartan facility in Pittsburgh.

Tartan had benchmarked its compiler at 1.3:1, compiled Ada size vs. assembly size, with comparable performance. This benchmark was established using an assembly program of 10,000 lines of code written by a very experienced assembly programmer, as compared with a relatively inexperienced Ada programmer.

Dave Syiek of Tartan performed the tests for QRS in July 1991. He began with the Ada which had been developed by QRS and an assembly version of the same CSC targeted to the C30 chip. This assembly was a translation of that which QRS had originally developed for the C15 chip. [4]

Syiek tested each version for speed by using the common benchmarking technique of placing the code to be timed inside a loop, timing a very large number of iterations through the loop, and then figuring the average time per loop iteration. He experimented with several combinations of compiler options with each version, and then he documented the fastest result for each. These results of the tests on the original working versions indicated little difference in running times between the assembly and the Ada, but the Ada was significantly smaller (see Table 1). [4]

Because of the size discrepancy between the original assembly and Ada versions, Syiek examined the code to determine the reason for it. He discovered that the assembly code had "unrolled" a loop which was executed six times. This means the code was copied six times instead of using a loop. It is a technique used to avoid the overhead of a looping structure and thus speed up code execution. To make something comparable in the Ada code, Syiek created a generic unit for the algorithm inside the loop, and he replaced the looping structure with six instantiations of the generic. At the same time, he also noticed three places where the addition of a local variable would avoid an unnecessary recalculation, and he added these variables. The fastest run of this new version of the Ada was comparable in size to the assembly code. However, the compiled Ada code ran approximately twice as fast as the assembly (see Table 1)! [4]

Table 1

                        +-----------+------------+
                        |  Size in  |  Speed in  |
       Code Version     |  Words    |  Microsecs |
      +-----------------+-----------+------------+
      | Assembly        |   410     |   48.4497  |
      | Original Ada    |   134     |   49.3164  | 
      | New Ada         |   414     |   25.1892  |
      +-----------------+-----------+------------+

It should be noted that the Ada was always compiled with all error checking suppressed and the default optimization level of the Tartan compiler (optimizing for speed inside loops, size outside). Also, the same linking optimizations were used for both the assembly and the Ada runs. [4]

5. The Application Was Rewritten

Because of the profound results, QRS decided no longer to pursue a waiver, and instead to use 100% Ada for this project. Use of the C30 chip in place of the C15 chip from the original system design resulted in some significant changes. The C30 chips were more expensive, and additional RAM chips were required for the development stage because the C30 chips are programmed by masking or by using other RAM. The total increase in hardware cost was just over $1000 per unit, and the power consumption was increased by 10 times (0.5 watts to 5 watts). However, the increased power consumption was acceptable, and the increased hardware cost was believed to be offset by the reduction in software development cost and reduced life cycle cost.

By the end of August 1991, the Ada version of the software had been completely written by the one Ada developer. This 700 lines of Ada development had taken three weeks of labor, approximately 150 hours. This was coding time only, because the documentation did not have to be redone. The original development in assembly had taken approximately 1500 hours, with 600 of those devoted to actual coding.

6. How Can This Be?

How can a compiler for a high order language beat assembly code in both size and performance? It is because of a reasonably high level of maturity on the part of both compiler technology in general and the compiler vendor in specific. When a vendor brings a wealth of experience to bear on the task of optimization, it goes beyond the capabilities of any one individual, no matter how experienced.

A high order language, such as Ada, will have certain inefficiencies in its unoptimized compiled code. These are a result of the inherent concept of a high order language, to suppress detail and simplify the expression of algorithms. Compiler optimizations do not actually produce optimal code, but they perform transformations on the code in an attempt to improve on the space and/or time required for execution. Because of the nature of optimization, it is impossible to attack it in a computationally efficient manner. Hence, the available techniques are all based on heuristics. There are many different optimizing techniques, and thus numerous heuristics which can be used for code transformations. [1, 6]

With many years of experience of hundreds of professional programmers to draw upon, a compiler can implement hundreds of heuristics from a library of knowledge in this area of technology. The accumulated experience represented in the optimization heuristics used by the compiler is the key to how well a compiler can do optimizations. A mature compiler uses "assembly code idioms, fast algorithms, and the best methods for doing any basic operation on the target machine." [3]

A skilled, experienced assembly language programmer may know many of the heuristics incorporated into a mature compiler, but not all of them. A group of assembly language programmers could possibly (though not likely) even know all that are implemented in a particular mature compiler product. However, a compiler can try a large number of optimizing techniques in a very short period of time. Most of these techniques interact, so optimization is iterated until there is no further change. [6] By comparison, a human must spend much time on even one optimization technique. Hence, there is a practical limit to the amount of optimizing that can be done by hand.

7. An Additional Advantage

Assembly code is difficult to understand even if written with good software engineering techniques in mind. Readability and understandability are some of the advantages of using a high order language such as Ada. When assembly code is optimized, it can become virtually unreadable. This makes code maintenance extremely difficult. [6]

For example, the C30 DSP chip uses a pipeline architecture. Normal branch instructions require three cycles to execute. However, three more instructions can be executed after the branch has been initiated but before the transfer of control. To get maximum processor throughput it is necessary to use delayed branches whenever possible. The scheduling of delayed branches results in an altered order of instruction execution. Because of the use of three instructions during the branch delay in the C30, the resulting order of instruction execution can become very obscure. [5]

However, when an application is written in a high order language and optimized by the compiler, maintenance is not affected. The code will be as maintainable as the language and the skill of the software developers permit. The optimization will be done automatically by the compiler. And it can be turned off during debugging, when it may be important to observe the actual order of statement execution. [6] Since compiler optimization, as opposed to hand-optimized assembly, leads to much more maintainable code, it can also reduce tremendously the overall system cost.

8. Implications

Hence, we must no longer assume that the use of Ada will necessarily result in slower, larger machine code than hand-generated assembly code. As technology advances rapidly, experience can be invaluable, but it can also be misused.

In the case of compiler technology, the use of accumulated experience in compiler optimization techniques leads to more effective optimization and thus better quality applications. An accumulation of knowledge over time provides for technology advancement.

However, experience is also yesterday's answer to today's problems. In the case described herein, QRS had assumed that information provided by another contractor's recent experience was current, relevant, and factual. This assumption was in error. Experience is a dangerous indicator of the current state of technology when the technology is changing very rapidly.

9. Dispelling the Myths

Advancements in compiler technology have dispelled at least five common Ada myths:

The results of this one small example of Ada versus assembly does not prove that Ada will provide such dramatic improvements on other sizes and types of applications. However, this was a significant challenge for the use of Ada. Typically, Ada has had its weakest showing in real-time applications as well as in relatively small applications. Hence, these results are very significant as an indicator that Ada systems and capabilities are maturing.

QRS is convinced. It has now decided to use Ada extensively because it believes the use of Ada will provide the company with a competitive edge in the market place.

Acknowledgments

The authors would like to thank Susan Englert and Dave Syiek of Tartan, Inc. for their assistance in clarifying the facts in this case study.

References

[1] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley: Reading, MA, 1986.

[2] DOD Directive 3405.2, 1987--later replaced with DODI 5000.2, 1991.

[3] D. A. Syiek, "Challenging Assembly Code Quality," Proceedings of the International Conference on DSP Applications and Technology, Berlin, Germany 1991.

[4] D. A. Syiek, phone conversation with P. K. Lawlis, 3 August 1992.

[5] D. A. Syiek and D. Burton, "Optimizing Ada Code for the TI SMJ320C30 Digital Signal Processor," Proceedings of the International Conference on DSP Applications and Technology, Brussels, Belgium 1990.

[6] W. M. Waite and G. Goos, Compiler Construction, Springer-Verlag: New York, NY, 1984.

Biographies

Patricia K. Lawlis received a BS in Mathematics from East Carolina University in 1967, an MS in Computer Systems from the Air Force Institute of Technology in 1982, and a PhD in Computer Science from Arizona State University in 1989. She is currently an Assistant Professor of Computer Science at the Air Force Institute of Technology (AFIT). She has been in the Air Force since 1974 and a member of the AFIT faculty since 1982. She is a member of the Association for Computing Machinery, the Computer Society of the Institute of Electrical and Electronics Engineers, Tau Beta Pi, and Upsilon Pi Epsilon.

Terence W. Elam received a BS in Industrial Engineering from Wayne State University in 1971. He has been employed by the U.S. Government since 1973, and is currently an engineer with the Defense Contracts Management Area Operations in Dayton, Ohio. He is a Professional Engineer and a Certified Manufacturing Engineer.