I'm in the middle of writing this. Feedback encouraged, non-constructive criticism ignored.
There's an engineering maxim known as the "golden triangle": "fast, cheap, good: pick two". It's a pithy saying designed to get the following idea across - there's no such thing as a free lunch. You can build something quickly, you can build something cheaply, or you can build something well. Actually building something requires you to balance these three factors, as your project requires.
In the same manner, performance is all about tradeoffs. Typically, the trade is memory footprint for speed. If you need your code to fit into a small amount of memory, your options for making it run faster are extermemly limited. Most tecniques for performance enhancements involve using memory in some manner, whether they're simple caches or coding techniques that "unroll" loops.
There's no real mystery about writing a Java program that performs well. As with developing for any platform, you simply must realize what operations are costly, and minimize your reliance on those operations. Below, I will list some of these operations. Some of these are the same as on any other system, while some are unique to Java, the language, and Java, the platform. Since my main experience is with coding back-end servers, I'll be concentrating on tips that I've picked up there, and thus ignoring some very valid areas, like AWT performance.
|I assume that C++ is faster than Java, but can anyone tell me for a |fact. Does this mean it is faster in all aspects or just some. Of course C++ is faster: it's a fact. It's faster even when you use an inappropriate algorithm, or when you have a slower machine or when you don't have enough memory.. It's also much faster to develop provably bug-free software in C++, see, for example, the software produced by Microsoft. I suggest the next discussion should be whether boats are faster than cars.
Now, on to the tips:
This is a basic topic, applicable to all computer languages, but bears repeating here.
To understand the difference between accessing something in memory, vs. accessing something on disk, just remember this: memory speed is measured in nanoseconds, while disk speed is measured in milliseconds. In the real world, this means that if reading from memory is like reading a book, reading from disk is like getting the book from the library, bringing it home, and then reading it - one letter at a time, aloud. The difference is substantial.
In this vein, buffers (which store things in memory) are crucial. The difference between a buffered read and an unbuffered read is very, very large. This is just as true in Java as it is in C. Unless you have something specific in mind always use a buffered read. Not only does the overhead of byte at a time read effect your program, but it will place a substantial resource drain on the entire computer you are running your program on.
Finally, always cache frequently used values, instead of reading them from disk. This goes all the way from storing values read from a Properties class in local variables, to implementing an LRU cache for files read from disk.
As an example, suppose that you have a program that makes extensive access to a 1MB data file. When you profile your program, you discover that a great deal of time is spent waiting for data to be returned from disk. The program is described as being "IO Bound" or "Disk Bound". What to do? You have several options. If you feel you have the memory to burn, perhaps the best thing to do is to is simply take the time at start up to read the whole darn thing into memory, and access it from memory.
As large as the difference is between memory and disk, so is the difference between reading something from disk, and retrieving it from the WAN. The caveat here is: don't go overboard. Remember, your enduser is going to have a finite amount of disk space to spend on cache for your program.
If you with to read a large file from a remote site, that's OK. You'll take a big performance hit, and there's nothing that you can do about it. But why do it again and again? Instead of reading the whole file every time, cache it on disk, and just check the timestamp to see if it's changed. (Note: this is how web browsers do it. The good ones, anyway.)
There are two parts to the measurement of a network connection's speed. The first, throughput, measures how much data can go through the pipe at once. The second, latency, is a measure of how long it takes the data to get there. These are related, of course, but not as tightly as you may think. Latency on some links (like satelite) can be significantly higher than others, while throughput remains constant. A discussion of why is out of scope for this paper.
The source code to the Sun implementation of Java is available for the asking. Obtaining it usually involves only filling out a form and sending it in. With it, you can do the code analysis yourself, finding the expensive portions of the underlying implementation. Don't be scared to read the code - it's only C, after all!
Many people seem suprised when I suggest they buy a good profiling tool. Maybe because it's Java, they expect it to be free. I guess it all depends on how much you value your time. While it's possible to profile your program and improve its performance without a tool, it will take much longer, and the job will not be as complete. Spend the money.