Writing Performant Java Code

I'm in the middle of writing this. Feedback encouraged, non-constructive criticism ignored.

A word about the golden triangle

There's an engineering maxim known as the "golden triangle": "fast, cheap, good: pick two". It's a pithy saying designed to get the following idea across - there's no such thing as a free lunch. You can build something quickly, you can build something cheaply, or you can build something well. Actually building something requires you to balance these three factors, as your project requires.

In the same manner, performance is all about tradeoffs. Typically, the trade is memory footprint for speed. If you need your code to fit into a small amount of memory, your options for making it run faster are extermemly limited. Most tecniques for performance enhancements involve using memory in some manner, whether they're simple caches or coding techniques that "unroll" loops.

Tips for Performance

There's no real mystery about writing a Java program that performs well. As with developing for any platform, you simply must realize what operations are costly, and minimize your reliance on those operations. Below, I will list some of these operations. Some of these are the same as on any other system, while some are unique to Java, the language, and Java, the platform. Since my main experience is with coding back-end servers, I'll be concentrating on tips that I've picked up there, and thus ignoring some very valid areas, like AWT performance.

  1. Use memory, rather than files.
  2. Use files, rather than networks.
  3. Keep your connections open.
  4. The cost of internationalization.
  5. Strings are not free.
  6. Byte arrays, rather than Strings.
  7. Threads are not free, but they're cheap.
  8. Avoid synchronization.
  9. Avoid new objects in a large multithreaded application.
  10. Use the Source, Luke.
  11. Pay the money already.
Heard on the Net:
|I assume that C++ is faster than Java, but can anyone tell me for a
|fact. Does this mean it is faster in all aspects or just some.

Of course C++ is faster: it's a fact. It's faster even when you use
an inappropriate algorithm, or when you have a slower machine or when
you don't have enough memory.. It's also much faster to develop
provably bug-free software in C++, see, for example, the software
produced by Microsoft.

I suggest the next discussion should be whether boats are faster than
cars.

Now, on to the tips:

Use memory rather than files

This is a basic topic, applicable to all computer languages, but bears repeating here.

Nano vs. Milli

To understand the difference between accessing something in memory, vs. accessing something on disk, just remember this: memory speed is measured in nanoseconds, while disk speed is measured in milliseconds. In the real world, this means that if reading from memory is like reading a book, reading from disk is like getting the book from the library, bringing it home, and then reading it - one letter at a time, aloud. The difference is substantial.

Buffers are your friend

In this vein, buffers (which store things in memory) are crucial. The difference between a buffered read and an unbuffered read is very, very large. This is just as true in Java as it is in C. Unless you have something specific in mind always use a buffered read. Not only does the overhead of byte at a time read effect your program, but it will place a substantial resource drain on the entire computer you are running your program on.

Cache, cache, and cache

Finally, always cache frequently used values, instead of reading them from disk. This goes all the way from storing values read from a Properties class in local variables, to implementing an LRU cache for files read from disk.

An example

As an example, suppose that you have a program that makes extensive access to a 1MB data file. When you profile your program, you discover that a great deal of time is spent waiting for data to be returned from disk. The program is described as being "IO Bound" or "Disk Bound". What to do? You have several options. If you feel you have the memory to burn, perhaps the best thing to do is to is simply take the time at start up to read the whole darn thing into memory, and access it from memory.

goto contents

Use files, rather than networks

Local cache

As large as the difference is between memory and disk, so is the difference between reading something from disk, and retrieving it from the WAN. The caveat here is: don't go overboard. Remember, your enduser is going to have a finite amount of disk space to spend on cache for your program.

An example

If you with to read a large file from a remote site, that's OK. You'll take a big performance hit, and there's nothing that you can do about it. But why do it again and again? Instead of reading the whole file every time, cache it on disk, and just check the timestamp to see if it's changed. (Note: this is how web browsers do it. The good ones, anyway.)

Keep your connections open

Latency and throughput

There are two parts to the measurement of a network connection's speed. The first, throughput, measures how much data can go through the pipe at once. The second, latency, is a measure of how long it takes the data to get there. These are related, of course, but not as tightly as you may think. Latency on some links (like satelite) can be significantly higher than others, while throughput remains constant. A discussion of why is out of scope for this paper.

Pipelining

One way to get around the latency inherent in all connections, and instead take advantage of the total throughput, is called pipelining. When you pipeline a connection, you talk more than you listen. The normal

Connection setup

Connection pools

One way to get around connection setup times

URLConnection and you

The costs of internationalization

Strings are not free

Concatenation - hah!

Internationalization

Byte arrays, rather than Strings

Threads are not free, but they're cheap

Avoid synchronization

Avoid new objects in large multithreaded apps

Use the Source, Luke.

The source code to the Sun implementation of Java is available for the asking. Obtaining it usually involves only filling out a form and sending it in. With it, you can do the code analysis yourself, finding the expensive portions of the underlying implementation. Don't be scared to read the code - it's only C, after all!

Pay the money already.

Many people seem suprised when I suggest they buy a good profiling tool. Maybe because it's Java, they expect it to be free. I guess it all depends on how much you value your time. While it's possible to profile your program and improve its performance without a tool, it will take much longer, and the job will not be as complete. Spend the money.