Wednesday, October 1, 2008

Question #4

Justify what situations or applications programmers will rather use Assembly Languages than Higher Level Progamming Languages and vice versa.

In the old days, it was pretty easy to understand that writing your programs in assembly would tend to yield higher performing results than writing in higher level languages. Compilers had solved the problem of "optimization" in too general and simplistic a way, and had no hope of competing with the sly human assembly coder. These days the story is slightly different. Compilers have gotten better and the CPUs have gotten harder to optimize for. Inside some research organizations the general consensus is that compilers could do at least as well as humans in almost all cases. During a presentation I gave to some folks at AT&T Bell labs (a former employer) I explained that I was going to implement a certain piece of software in assembly language, which raised eyebrows. One person went so far as to stop me and suggest a good C/C++ compiler that would do a very good job of generating optimized object code and make my life a lot easier. But have compilers really gotten so good that humans cannot compete? I offer the following facts: High level languages like C and C++ treat the host CPU in a very generic manner. While local optimizations such as loop unrolling, and register resource contention are easy for compilers to deal with, odd features like 32 byte cache lines, 8K data/code cache totals, multiple execution units, and burst device memory interfaces are something not easily expressed or exploited by a C/C++ compiler. On a Pentium, it is ordinarily beneficial to declare your data so that its usage on inner loops retains as much as possible in the cache for as long as possible. This can require bizarre declaration requirements which are most easily dealt with by using unions of 8K structures for all data used in your inner loops. This way you can overlap data with poor cache coherency together, while using as much of the remainder of the cache for data with good cache coherency. The pentium also has an auxiliary floating point execution unit which can actually perform floating point operations concurrently with integer computations. This can lead to algorithmic designs which require an odd arrangement of your code, that has no sensible correspondence with high level code that computes the same thing. Basically, on the pentium, C/C++ compilers have no easy way to translate source code to cache structured aware data and code along with concurrently running floating point. The MMX generation of x86 processors will pose even greater problems. Nevertheless I explained to the folks at Bell Labs that I owned the compiler that they suggested, and that when it came to optimizations, I could (can) easily code circles around it. The classic example of overlooking these points above is that of one magnanimous programmer who came from a large company and declared to the world, through USENET, that he was going to write a "100% optimal 3D graphics library completely in C++". He emphatically defended his approach with long flaming postings insisting that modern C++ compilers would be able to duplicate any hand rolled assembly language optimization trick. He got most of the way through before abandoning his project. He eventually realized that the only viable solution for existing PC platforms is to exploit the potential for pipelining the FPU and the integer CPU instructions in a maximally concurrent way. Something no x86 based C compiler in production today is capable of doing. I always roll my eyes when the old debate of whether or not the days of hand rolled assembly language are over resurfaces in the USENET assembly language newsgroups. On the other hand perhaps I should be thankful since these false beliefs about the abilities of C/C++ compilers in other programmers only, by comparison, differentiates my abilities more clearly to my employer. The conclusion you should take away from this (and my other related web pages) is that when pedal to the metal kinds of performance is required, that there is a significant performance margin to be gained in using assembly language. Ordinarily, one combines C/C++ and assembly by using the compiler's inline assembly feature, or by linking to a library of hand rolled assembly routines.

No comments: