This appendix was
contributed by and used with the permission of Joe Sharp, consultant
(SharpJoe@aol.com).
The Java
language emphasizes accurate, reliable behavior at the expense of performance.
This is reflected in features such as automatic garbage collection, rigorous
runtime checking, complete byte code checking, and conservative runtime
synchronization. Availability on a wide choice of platforms leads, at present,
to an interpreted virtual machine that further handicaps performance.
About performance, Steve McConnell [16]
quoted: “Complete it first, and then perfect it. The part that needs to be
perfect is usually small.” This appendix will aid you in locating and
optimizing that “part that needs to be
perfect.”
You should address performance only
after you have a correct and fully tested program:
Finding the critical
bottleneck is the key to cost-effective effort – Donald Knuth [9] improved
a program where 50 percent of the time was spent in less than 4 percent of the
code. He changed a few lines in an hour of work and doubled the program speed.
Working on the rest of the program would have dissipated his valuable time and
effort. To quote Knuth, “Premature optimization is the root of all
evil.” It is wise to restrain your impulses to optimize early because you
may forgo many useful programming techniques, resulting in code that’s
harder to understand, riskier, and requires more effort to
maintain.
“Profile” code by
inserting explicit timing:
long start = System.currentTimeMillis(); // Operation to be timed goes here long time = System.currentTimeMillis() - start;
Have an infrequently-used method
print cumulative times out to the console window with
System.out.println( ). Since the compiler will ignore it when false,
a static final boolean switch can turn the timing on and off so the code
can efficiently be left in place in released code, ready for emergency use at
any time. Even when more sophisticated profiling is available, this is a
convenient way to time a specific task or operation.
System.currentTimeMillis( )
returns time in 1/1000ths of a second. However, some systems with time
resolution less than a millisecond (such as a Windows PC) need to repeat an
operation n times and divide the total time by n to get accurate
estimates.
The JDK comes with a built-in
profiler that keeps track of the time spent in each routine and writes the
information to a file. Unfortunately, the JDK profilers have uneven performance.
JDK 1.1.1 works, but subsequent releases have had various
instabilities.
To run the profiler, use the
-prof option when invoking the unoptimized versions of the Java
interpreter, for example:
java_g -prof myClass
Or with an applet:
java_g -prof sun.applet.AppletViewer applet.html
The profiler output is not
particularly easy to decipher. In fact, in JDK 1.0 it truncates the method names
to 30 characters, so it might not be possible to distinguish between some
methods. However, if your platform does support the -prof option, either
Vladimir Bulatov’s HyperProf [3] or Greg White’s
ProfileViewer [4] will help interpret the
results.
The best way to keep up with the
exploding field of performance optimization tools is through a Web site such as
Jonathan Hardwick’s Tools for Optimizing Java [5] at
http://www.cs.cmu.edu/~jch/java/tools.html.
Now that the critical region has
been isolated, you can apply two types of
optimizations: generic techniques and those
specific to Java.
An effective generic speedup is to
redefine the program in a more practical way. For example, in Programming
Pearls [14], Bentley describes Doug McIlroy’s representation of the
English language with a novel data depiction that enabled him to produce a
remarkably fast, compact spelling checker. In addition, choosing a better
algorithm will probably give a bigger performance gain than any other approach,
particularly as the size of the data set increases. For more of these generic
approaches, see the general book listings [12-19] at the end of this appendix.
To put things in perspective,
it’s useful to look at the time it takes to perform various operations. So
that the results are relatively independent of the computer being used, they
have been normalized by dividing by the time it takes to make a local
assignment.
Operation |
Example |
Normalized time |
Local assignment |
i = n; |
1.0 |
Instance
assignment |
this.i = n; |
1.2 |
int
increment |
i++; |
1.5 |
byte
increment |
b++; |
2.0 |
short
increment |
s++; |
2.0 |
float
increment |
f++; |
2.0 |
double
increment |
d++; |
2.0 |
Empty loop |
while(true)
n++; |
2.0 |
Ternary expression |
(x<0) ? -x :
x |
2.2 |
Math call |
Math.abs(x); |
2.5 |
Array assignment |
a[0] = n; |
2.7 |
long
increment |
l++; |
3.5 |
Method call |
funct( ); |
5.9 |
throw and catch
exception |
try{ throw e; }
catch(e){} |
320 |
synchronized method
call |
synchMethod( ); |
570 |
New
Object |
new
Object( ); |
980 |
New array |
new
int[10]; |
3100 |
Using present systems (such as
Pentium 200 pro, Netscape 3, and JDK 1.1.5), these relative times show the
extraordinary cost of new objects and arrays, the heavy cost of synchronization,
and the modest cost of an unsynchronized method call. References [5] and [6]
give the Web address of measurement applets you can run on your own
machine.
Here are some modifications that
you can make to speed up time-critical parts of your Java program. (Be sure to
test the performance before and after you try them.)
The cost of Strings: The
String concatenation operator + looks innocent but involves a lot of
work. The compiler can efficiently concatenate constant strings, but a variable
string requires considerable processing. For example, if s and t
are String variables:
System.out.println("heading" + s + "trailer" + t);
this requires a new
StringBuffer, appending arguments, and converting the result back to a
String with toString( ). This costs both space and time. If
you’re appending more than one String, consider using a
StringBuffer directly, especially if you can repeatedly reuse it in a
loop. Preventing the creation of a new StringBuffer on each iteration
saves the object creation time of 980 seen earlier. Using
substring( ) and the other String methods is usually an
improvement. When feasible, character arrays are even faster. Also notice that
StringTokenizer is costly because of synchronization.
Synchronization: In the JDK
interpreter, calling a synchronized method is typically 10 times slower
than calling an unsynchronized method. With JIT compilers, this performance gap
has increased to 50 to 100 times (notice the timing above shows it to be 97
times slower). Avoid synchronized methods if you can – if you
can’t, synchronizing on methods rather than on code blocks is slightly
faster.
Reusing objects: It takes a
long time to create an object (the timing above shows 980 assignment times for a
new Object, and 3100 assignment times for a small new array), so
it’s often worth saving and updating the fields of an old object instead
of creating a new object. For example, rather than creating a new Font
object in your paint( ) method, you can declare it an instance
object, initialize it once, and then just update it when necessary in
paint( ). See also Bentley, Programming Pearls p. 81
[15].
Exceptions: You should only
throw exceptions in abnormal situations, which are usually cases in which
performance is not an issue since the program has run into a problem that it
doesn’t normally expect. When optimizing, combine small
try-catch blocks, which thwart compiler optimization by breaking
the code into small independent sections. On the other hand, be careful of
sacrificing the robustness of your code by over-zealous removal of exception
handling.
Hashing: The standard
Hashtable class in Java 1.0 and 1.1 requires casting and costly
synchronization (570 assignment times). Furthermore, the early JDK library
doesn’t deliberately choose prime number table sizes. Finally, a hashing
function should be designed for the particular characteristics of the keys
actually used. For all these reasons, the generic Hashtable can be
improved by designing a hash class that fits a particular application.
Note that the HashMap in the Java 1.2 collections library has much
greater flexibility and isn’t automatically
synchronized.
Method inlining: Java
compilers can inline a method only if it is final, private, or
static, and in some cases it must have no local variables. If your code
spends a lot of time calling a method that has none of these modifiers, consider
writing a version that is final.
I/O: Use buffers wherever
possible, otherwise you can end up doing I/O a single byte at a time. Note that
the JDK 1.0 I/O classes use a lot of synchronization, so you might get better
performance by using a single “bulk” call such as
readFully( ) and then interpreting the data yourself. Also notice
that the Java 1.1 “reader” and “writer” classes were
designed for improved performance.
Casts and instanceof:
Casts take from 2 to 200 assignment times. The more costly ones require
travel up the inheritance hierarchy. Other costly operations lose and restore
capabilities of the lower level constructs.
Graphics: Use clipping to
reduce the amount of work done in repaint( ), double buffering to
improve perceived speed, and image strips or compression to speed downloading
times. Animation in Java Applets from JavaWorld and Performing
Animation from Sun are two good tutorials. Remember to use high-level
primitives; it’s much faster to call drawPolygon( ) on a bunch
of points than looping with drawLine( ). If you must draw a
one-pixel-wide line, drawLine(x,y,x,y) is faster than
fillRect(x,y,1,1).
Using API classes: Use
classes from the Java API when they offer native machine performance that you
can’t match using Java. For example, arrayCopy( ) is much
faster than using a loop to copy an array of any significant size.
Replacing API classes:
Sometimes API classes do more than you need, with a corresponding increase in
execution time. For these you can write specialized versions that do less but
run faster. For example, one application that needed a container to store many
arrays was speeded by replacing the original Vector with a faster dynamic
array of objects.
[1] MicroBenchmark running on
Pentium Pro (200Mh), Netscape 3.0, JDK 1.1.4 (see reference [5]
below).
[2] Sun’s Java document page
on the JDK Java interpreter
http://java.sun.com/products/JDK/tools/win32/java.html
[3] Vladimir Bulatov’s
HyperProf
http://www.physics.orst.edu/~bulatov/HyperProf
[5] The premiere online references
for optimizing Java code are Jonathan Hardwick’s Java Optimization
site at http://www.cs.cmu.edu/~jch/java/optimization.html,
“Tools for Optimizing Java” at
http://www.cs.cmu.edu/~jch/java/tools.html, and “Java
Microbenchmarks” (with a quick 45 second measurement benchmark) at
http://www.cs.cmu.edu/~jch/java/benchmarks.html.
[6] Make Java fast: Optimize!
How to get the greatest performance out of your code through low-level
optimizations in Java by Doug Bell
http://www.javaworld.com/javaworld/jw-04-1997/jw-04-optimize.html,
complete with an extensive annotated measurement Benchmark
applet.
[7] Java Optimization
Resources
http://www.cs.cmu.edu/~jch/java/resources.html
[8] Optimizing Java for
Speed http://www.cs.cmu.edu/~jch/java/speed.html
[9] An Empirical Study of
FORTRAN Programs by Donald Knuth, 1971, Software – Practice and
Experience, Volume 1 p. 105-33.
[10] Building High-Performance
Applications and Servers in Java: An Experiential Study, by Jimmy Nguyen,
Michael Fraenkel, Richard Redpath, Binh Q. Nguyen, and Sandeep K. Singhal; IBM
Software Solutions, IBM T.J. Watson Research Center.
http://www.ibm.com/java/education/javahipr.html.
[11] Advanced Java, Idioms,
Pitfalls, Styles, and Programming Tips, by Chris Laffra, Prentice
Hall, 1997. (Java 1.0) Chapter Sections
11-20.
[12] Data Structures and C
Programs by Christopher J. Van Wyk, Addison-Wesley, 1988.
[13] Writing Efficient Programs
by Jon Bentley, Prentice Hall, 1982, especially p. 110 and p.
145-151.
[14] More Programming Pearls
by Jon Bentley. Association for Computing Machinery, February
1988.
[15] Programming Pearls by
Jon Bentley, Addison-Wesley 1989. Part II addresses generic performance
enhancements.
[16] Code Complete: A Practical
Handbook of Software Construction by Steve McConnell, Microsoft Press 1993,
Chapter 9.
[17] Object-Oriented System
Development by Champeaux, Lea, and Faure, Chapter 25.
[18] The Art of Programming
by Donald Knuth, Volume 1 Fundamental Algorithms 3rd Edition,
1997; Volume 2, Seminumerical Algorithms 3rd Edition; Volume 3
Sorting and Searching 2nd Edition, Addison-Wesley. The
definitive encyclopedia of algorithms.
[19] Algorithms in C:
Fundamentals, Data Structures, Sorting, Searching by Robert Sedgewick,
3rd Edition, Addison-Wesley 1997. The author is an apprentice of
Knuth’s. This is one of seven editions devoted to several languages and
contains timely, somewhat simpler treatments of
algorithms.