[ Viewing Hints ] [ 2nd Edition ] [ Free Newsletter ]
[ Seminars ] [ Seminars on CD ROM ] [ Consulting ]

Thinking in Java, 1st edition

©1998 by Bruce Eckel

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]

D: Performance

This appendix was contributed by and used with the permission of Joe Sharp, consultant (SharpJoe@aol.com).

The Java language emphasizes accurate, reliable behavior at the expense of performance. This is reflected in features such as automatic garbage collection, rigorous runtime checking, complete byte code checking, and conservative runtime synchronization. Availability on a wide choice of platforms leads, at present, to an interpreted virtual machine that further handicaps performance. About performance, Steve McConnell [16] quoted: “Complete it first, and then perfect it. The part that needs to be perfect is usually small.” This appendix will aid you in locating and optimizing that “part that needs to be perfect.”

Basic approach

You should address performance only after you have a correct and fully tested program:

Measure the program’s performance under realistic conditions. If it meets your requirements, you are finished. If not, go to the next step.
Find the most critical performance bottleneck. This might require considerable ingenuity, but the effort will pay off. If you simply guess where the bottleneck is and try to optimize there, you’ll waste your time.
Apply the speed improvement techniques discussed in this appendix, then return to Step 1.

Finding the critical bottleneck is the key to cost-effective effort – Donald Knuth [9] improved a program where 50 percent of the time was spent in less than 4 percent of the code. He changed a few lines in an hour of work and doubled the program speed. Working on the rest of the program would have dissipated his valuable time and effort. To quote Knuth, “Premature optimization is the root of all evil.” It is wise to restrain your impulses to optimize early because you may forgo many useful programming techniques, resulting in code that’s harder to understand, riskier, and requires more effort to maintain.

Locating the bottleneck

Three approaches to locating the performance-critical part of a program are:

1. Install your own instrumentation

“Profile” code by inserting explicit timing:

long start = System.currentTimeMillis();
   // Operation to be timed goes here
long time = System.currentTimeMillis() - start;

Have an infrequently-used method print cumulative times out to the console window with System.out.println( ). Since the compiler will ignore it when false, a static final boolean switch can turn the timing on and off so the code can efficiently be left in place in released code, ready for emergency use at any time. Even when more sophisticated profiling is available, this is a convenient way to time a specific task or operation.

System.currentTimeMillis( ) returns time in 1/1000ths of a second. However, some systems with time resolution less than a millisecond (such as a Windows PC) need to repeat an operation n times and divide the total time by n to get accurate estimates.

2. JDK profiling [2]

The JDK comes with a built-in profiler that keeps track of the time spent in each routine and writes the information to a file. Unfortunately, the JDK profilers have uneven performance. JDK 1.1.1 works, but subsequent releases have had various instabilities.

To run the profiler, use the -prof option when invoking the unoptimized versions of the Java interpreter, for example:

java_g -prof myClass

Or with an applet:

java_g -prof sun.applet.AppletViewer applet.html

The profiler output is not particularly easy to decipher. In fact, in JDK 1.0 it truncates the method names to 30 characters, so it might not be possible to distinguish between some methods. However, if your platform does support the -prof option, either Vladimir Bulatov’s HyperProf [3] or Greg White’s ProfileViewer [4] will help interpret the results.

3. Special tools

The best way to keep up with the exploding field of performance optimization tools is through a Web site such as Jonathan Hardwick’s Tools for Optimizing Java [5] at http://www.cs.cmu.edu/~jch/java/tools.html.

Tips for measuring performance

Since profiling uses clock time, make every effort to remove other processes during the measurement.
Always time the code before and after making changes to verify that, at least on the test platform, your changes improved the program. (Jon Bentley mentioned that some of his most logical changes actually slowed the program down.)
Try to make each timing test under identical conditions.
If possible, contrive a test that doesn’t rely on any user input to avoid variations in user response that can cause the results to fluctuate.

Speedup techniques

Now that the critical region has been isolated, you can apply two types of optimizations: generic techniques and those specific to Java.

Generic approaches

An effective generic speedup is to redefine the program in a more practical way. For example, in Programming Pearls [14], Bentley describes Doug McIlroy’s representation of the English language with a novel data depiction that enabled him to produce a remarkably fast, compact spelling checker. In addition, choosing a better algorithm will probably give a bigger performance gain than any other approach, particularly as the size of the data set increases. For more of these generic approaches, see the general book listings [12-19] at the end of this appendix.

Language dependent approaches

To put things in perspective, it’s useful to look at the time it takes to perform various operations. So that the results are relatively independent of the computer being used, they have been normalized by dividing by the time it takes to make a local assignment.

Operation

Example

Normalized time

Local assignment

i = n;

1.0

Instance assignment

this.i = n;

1.2

int increment

i++;

1.5

byte increment

b++;

2.0

short increment

s++;

2.0

float increment

f++;

2.0

double increment

d++;

2.0

Empty loop

while(true) n++;

2.0

Ternary expression

(x<0) ? -x : x

2.2

Math call

Math.abs(x);

2.5

Array assignment

a[0] = n;

2.7

long increment

l++;

3.5

Method call

funct( );

5.9

throw and catch exception

try{ throw e; } catch(e){}

320

synchronized method call

synchMethod( );

570

New Object

new Object( );

980

New array

new int[10];

3100

Using present systems (such as Pentium 200 pro, Netscape 3, and JDK 1.1.5), these relative times show the extraordinary cost of new objects and arrays, the heavy cost of synchronization, and the modest cost of an unsynchronized method call. References [5] and [6] give the Web address of measurement applets you can run on your own machine.

General modifications

Here are some modifications that you can make to speed up time-critical parts of your Java program. (Be sure to test the performance before and after you try them.)

Replace

With

Why

Interface

Abstract Class (when only one parent is needed)

Multiple inheritance of interfaces prevents some optimizations.

Non-local or array loop variable

Local loop variable

Time (above) shows an instance integer assignment is 1.2 local integer assignments, but an array assignment is 2.7 local integer assignments.

Linked list (fixed size)

Saving discarded link items or replacing the list with a circular array (in which approximate size is known)

Each new object takes 980 local assignments. See Reusing Objects (below), Van Wyk [12] p. 87 and Bentley[15] p. 81

x/2 (or any power of 2)

X >> 2
(or any power of 2)

Uses faster hardware instructions

Specific situations

The cost of Strings: The String concatenation operator + looks innocent but involves a lot of work. The compiler can efficiently concatenate constant strings, but a variable string requires considerable processing. For example, if s and t are String variables:

System.out.println("heading" + s + "trailer" + t);

this requires a new StringBuffer, appending arguments, and converting the result back to a String with toString( ). This costs both space and time. If you’re appending more than one String, consider using a StringBuffer directly, especially if you can repeatedly reuse it in a loop. Preventing the creation of a new StringBuffer on each iteration saves the object creation time of 980 seen earlier. Using substring( ) and the other String methods is usually an improvement. When feasible, character arrays are even faster. Also notice that StringTokenizer is costly because of synchronization.

Synchronization: In the JDK interpreter, calling a synchronized method is typically 10 times slower than calling an unsynchronized method. With JIT compilers, this performance gap has increased to 50 to 100 times (notice the timing above shows it to be 97 times slower). Avoid synchronized methods if you can – if you can’t, synchronizing on methods rather than on code blocks is slightly faster.

Reusing objects: It takes a long time to create an object (the timing above shows 980 assignment times for a new Object, and 3100 assignment times for a small new array), so it’s often worth saving and updating the fields of an old object instead of creating a new object. For example, rather than creating a new Font object in your paint( ) method, you can declare it an instance object, initialize it once, and then just update it when necessary in paint( ). See also Bentley, Programming Pearls p. 81 [15].

Exceptions: You should only throw exceptions in abnormal situations, which are usually cases in which performance is not an issue since the program has run into a problem that it doesn’t normally expect. When optimizing, combine small try-catch blocks, which thwart compiler optimization by breaking the code into small independent sections. On the other hand, be careful of sacrificing the robustness of your code by over-zealous removal of exception handling.

Hashing: The standard Hashtable class in Java 1.0 and 1.1 requires casting and costly synchronization (570 assignment times). Furthermore, the early JDK library doesn’t deliberately choose prime number table sizes. Finally, a hashing function should be designed for the particular characteristics of the keys actually used. For all these reasons, the generic Hashtable can be improved by designing a hash class that fits a particular application. Note that the HashMap in the Java 1.2 collections library has much greater flexibility and isn’t automatically synchronized.

Method inlining: Java compilers can inline a method only if it is final, private, or static, and in some cases it must have no local variables. If your code spends a lot of time calling a method that has none of these modifiers, consider writing a version that is final.

I/O: Use buffers wherever possible, otherwise you can end up doing I/O a single byte at a time. Note that the JDK 1.0 I/O classes use a lot of synchronization, so you might get better performance by using a single “bulk” call such as readFully( ) and then interpreting the data yourself. Also notice that the Java 1.1 “reader” and “writer” classes were designed for improved performance.

Casts and instanceof: Casts take from 2 to 200 assignment times. The more costly ones require travel up the inheritance hierarchy. Other costly operations lose and restore capabilities of the lower level constructs.

Graphics: Use clipping to reduce the amount of work done in repaint( ), double buffering to improve perceived speed, and image strips or compression to speed downloading times. Animation in Java Applets from JavaWorld and Performing Animation from Sun are two good tutorials. Remember to use high-level primitives; it’s much faster to call drawPolygon( ) on a bunch of points than looping with drawLine( ). If you must draw a one-pixel-wide line, drawLine(x,y,x,y) is faster than fillRect(x,y,1,1).

Using API classes: Use classes from the Java API when they offer native machine performance that you can’t match using Java. For example, arrayCopy( ) is much faster than using a loop to copy an array of any significant size.

Replacing API classes: Sometimes API classes do more than you need, with a corresponding increase in execution time. For these you can write specialized versions that do less but run faster. For example, one application that needed a container to store many arrays was speeded by replacing the original Vector with a faster dynamic array of objects.

Other suggestions

Move repeated constant calculations out of a critical loop, for example, computing buffer.length for a constant-size buffer.
static final constants can help the compiler optimize the program.
Unroll fixed length loops.
Use javac’s optimization option, -O, which optimizes compiled code by inlining static, final, and private methods. Note that your classes may get larger in size (JDK 1.1 or later only – earlier versions might not perform byte verification). Newer just-in-time (JIT) compilers will dramatically speed the code.
Count down to zero whenever possible – this uses a special JVM byte code.

References

Performance tools

[1] MicroBenchmark running on Pentium Pro (200Mh), Netscape 3.0, JDK 1.1.4 (see reference [5] below).

[2] Sun’s Java document page on the JDK Java interpreter http://java.sun.com/products/JDK/tools/win32/java.html

[3] Vladimir Bulatov’s HyperProf http://www.physics.orst.edu/~bulatov/HyperProf

[4] Greg White’s ProfileViewer http://www.inetmi.com/~gwhi/ProfileViewer/ProfileViewer.html

Web sites

[5] The premiere online references for optimizing Java code are Jonathan Hardwick’s Java Optimization site at http://www.cs.cmu.edu/~jch/java/optimization.html, “Tools for Optimizing Java” at http://www.cs.cmu.edu/~jch/java/tools.html, and “Java Microbenchmarks” (with a quick 45 second measurement benchmark) at http://www.cs.cmu.edu/~jch/java/benchmarks.html.

Articles

[6] Make Java fast: Optimize! How to get the greatest performance out of your code through low-level optimizations in Java by Doug Bell http://www.javaworld.com/javaworld/jw-04-1997/jw-04-optimize.html, complete with an extensive annotated measurement Benchmark applet.

[7] Java Optimization Resources http://www.cs.cmu.edu/~jch/java/resources.html

[8] Optimizing Java for Speed http://www.cs.cmu.edu/~jch/java/speed.html

[9] An Empirical Study of FORTRAN Programs by Donald Knuth, 1971, Software – Practice and Experience, Volume 1 p. 105-33.

[10] Building High-Performance Applications and Servers in Java: An Experiential Study, by Jimmy Nguyen, Michael Fraenkel, Richard Redpath, Binh Q. Nguyen, and Sandeep K. Singhal; IBM Software Solutions, IBM T.J. Watson Research Center. http://www.ibm.com/java/education/javahipr.html.

Java specific books

[11] Advanced Java, Idioms, Pitfalls, Styles, and Programming Tips, by Chris Laffra, Prentice Hall, 1997. (Java 1.0) Chapter Sections 11-20.

General books

[12] Data Structures and C Programs by Christopher J. Van Wyk, Addison-Wesley, 1988.

[13] Writing Efficient Programs by Jon Bentley, Prentice Hall, 1982, especially p. 110 and p. 145-151.

[14] More Programming Pearls by Jon Bentley. Association for Computing Machinery, February 1988.

[15] Programming Pearls by Jon Bentley, Addison-Wesley 1989. Part II addresses generic performance enhancements.

[16] Code Complete: A Practical Handbook of Software Construction by Steve McConnell, Microsoft Press 1993, Chapter 9.

[17] Object-Oriented System Development by Champeaux, Lea, and Faure, Chapter 25.

[18] The Art of Programming by Donald Knuth, Volume 1 Fundamental Algorithms 3^rd Edition, 1997; Volume 2, Seminumerical Algorithms 3^rd Edition; Volume 3 Sorting and Searching 2^nd Edition, Addison-Wesley. The definitive encyclopedia of algorithms.

[19] Algorithms in C: Fundamentals, Data Structures, Sorting, Searching by Robert Sedgewick, 3^rd Edition, Addison-Wesley 1997. The author is an apprentice of Knuth’s. This is one of seven editions devoted to several languages and contains timely, somewhat simpler treatments of algorithms.

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]
Last Update:02/04/2000

Operation	Example	Normalized time
Local assignment	i = n;	1.0
Instance assignment	this.i = n;	1.2
int increment	i++;	1.5
byte increment	b++;	2.0
short increment	s++;	2.0
float increment	f++;	2.0
double increment	d++;	2.0
Empty loop	while(true) n++;	2.0
Ternary expression	(x<0) ? -x : x	2.2
Math call	Math.abs(x);	2.5
Array assignment	a[0] = n;	2.7
long increment	l++;	3.5
Method call	funct( );	5.9
throw and catch exception	try{ throw e; } catch(e){}	320
synchronized method call	synchMethod( );	570
New Object	new Object( );	980
New array	new int[10];	3100

Replace	With	Why
Interface	Abstract Class (when only one parent is needed)	Multiple inheritance of interfaces prevents some optimizations.
Non-local or array loop variable	Local loop variable	Time (above) shows an instance integer assignment is 1.2 local integer assignments, but an array assignment is 2.7 local integer assignments.
Linked list (fixed size)	Saving discarded link items or replacing the list with a circular array (in which approximate size is known)	Each new object takes 980 local assignments. See Reusing Objects (below), Van Wyk [12] p. 87 and Bentley[15] p. 81
x/2 (or any power of 2)	X >> 2 (or any power of 2)	Uses faster hardware instructions