java - String Pool: "Te"+"st" faster than "Test"?

Saturday, 5 August 2017

java - String Pool: "Te"+"st" faster than "Test"?

I am trying some performance benchmark regarding String Pool. However, the outcome is not expected.

I made 3 static methods

perform0() method ... creates a new object every time

perform1() method ... String literal "Test"

perform2() method ... String constant expression "Te"+"st"

My expectation was (1. fastest -> 3. slowest)

"Test" because of string pooling

"Te"+"st" because of string pooling but bit slower than 1 because of + operator

new String(..) because of no string pooling.

But the benchmark shows that "Te"+"st" is slighty faster than "Test".

new String(): 141677000 ns 
"Test"      : 1148000 ns 
"Te"+"st"   : 1059000 ns

new String(): 141253000 ns
"Test"      : 1177000 ns
"Te"+"st"   : 1089000 ns


new String(): 142307000 ns
"Test"      : 1878000 ns
"Te"+"st"   : 1082000 ns

new String(): 142127000 ns
"Test"      : 1155000 ns
"Te"+"st"   : 1078000 ns
...

Here's the code:

import java.util.concurrent.TimeUnit;


public class StringPoolPerformance {

    public static long perform0() {
        long start = System.nanoTime();

        for (int i=0; i<1000000; i++) {
            String str = new String("Test");
        }
        return System.nanoTime()-start;
    }

    public static long perform1() {
        long start = System.nanoTime();
        for (int i=0; i<1000000; i++) {
            String str = "Test";

        }
        return System.nanoTime()-start;
    }

    public static long perform2() {
        long start = System.nanoTime();
        for (int i=0; i<1000000; i++) {
            String str = "Te"+"st";
        }
        return System.nanoTime()-start;

    }

    public static void main(String[] args) {
        long time0=0, time1=0, time2=0;
        for (int i=0; i<100; i++) {
            // result
            time0 += perform0();
            time1 += perform1();
            time2 += perform2();
        }


        System.out.println("new String(): " +  time0 + " ns");
        System.out.println("\"Test\"      : " + time1 + " ns");
        System.out.println("\"Te\"+\"st\"   : " + time2 + " ns");
    }
}

Can someone explain why "Te"+"st" performs faster than "Test"? Is JVM doing some optimizations here? Thank you.

Answer

"Te" + "st" is a compiler-time constant expression, and so will behave at runtime no differently than simply "Test". Any performance hit will be when trying to compile it, not when trying to run it.

That's easily proven by disassembling your compiled benchmark class using javap -c StringPoolPerformance:

public static long perform1();
  Code:
...
   7:   ldc #3; //int 1000000
   9:   if_icmpge   21
   12:  ldc #5; //String Test

   14:  astore_3
   15:  iinc    2, 1
...

public static long perform2();
  Code:
...
   7:   ldc #3; //int 1000000
   9:   if_icmpge   21
   12:  ldc #5; //String Test

   14:  astore_3
   15:  iinc    2, 1
...

The methods' byte code are absolutely identical! This is specified by the Java Language Specification, 15.18.1:

The String object is newly created (§12.5) unless the expression is a compile-time constant expression (§15.28).

The benchmark difference you experience is probably due to typical variability or because your benchmark isn't perfect. See this question: How do I write a correct micro-benchmark in Java?

Some notable rules you break:

You don't discard the results of the "warm-up" iterations of your test kernel.

You don't have GC logging enabled (particularly relevant when perform1() is always being run right after the test which creates a million objects).

Blog

Saturday, 5 August 2017