Home > Software design >  Java string concatenation optimisation is applied in this case?
Java string concatenation optimisation is applied in this case?

Time:05-24

Let's imagine I have a lib which contains the following simple method:


private static final String CONSTANT = "Constant";

public static String concatStringWithCondition(String condition) {
   return "Some phrase"   condition   CONSTANT;

}

What if someone wants to use my method in a loop? As I understand, that string optimisation (where gets replaced with StringBuilder or whatever is more optimal) is not working for that case? Or this is valid for strings initialised outside of the loop?

I'm using java 11 (Dropwizard).

Thanks.

CodePudding user response:

No, this is fine.

The only case that string concatenation can be problematic is when you're using a loop to build one single string. Your method by itself is fine. Callers of your method can, of course, mess things up, but not in a way that's related to your method.

CodePudding user response:

The code as written should be as efficient as making a StringBuilder and appending these 3 constants to it. There certainly is absolutely no difference at all between a literal ("Some phrase"), and an expression that the compiler can treat as a Compile Time Constant (which CONSTANT, here, clearly is - given that CONSTANT is static, final, not null, and of a CTCable type (All primitives and strings)).

However, is that 'efficient'? I doubt it - making a stringbuilder is not particularly cheap either. It's orders of magnitude cheaper than continually making new strings, sure, but there's always a bigger fish:

It doesn't matter

Computers are fast. Really, really fast. It is highly likely that you can write this incredibly badly (performance wise) and it still won't be measurable. You won't even notice. Less than a millisecond slower.

In general, anybody that worries about performance at this level simply lacks perspective and knowledge: If you apply that level of fretting to your java code and you have the knowledge to know what could in theory be non-perfectly-performant, you'll be sweating every 3rd character you ever type. That's no way to program. So, gain that perspective (or take it from me, "just git gud" is not exactly something you can do in a week - take it on faith for now, as you learn you can start verifying) - and don't worry about it. Unless you actually run into an actual situation where the code is slower than it feels like it could be, or slower than it needs to be, and then toss profilers and microbenchmark testing frameworks at it, and THEN, armed with all that information (and not before!), consider optimizing. The reports tell you what to optimize, because literally less than 1% of the code is responsible for 99% of the performance loss, so spending any time on code that isn't in that 1% is an utter waste of time, hence why you must get those reports first, or not start at all.

... or perhaps it does

But if it does matter, and it's really that 1% of the code that is responsible for 99% of the loss, then usually you need to go a little further than just 'optimize the method'. Optimize the entire pipeline.

What is happening with this string? Take that into consideration.

For example, let's say that it, itself, is being appended to a much bigger stringbuilder. In which case, making a tiny stringbuilder here is incredibly inefficient compared to rewriting the method to:

public static void concatStringWithCondition(StringBuilder sb, String condition) {
   sb.append("Some phrase").append(condition).append(CONSTANT);
}

Or, perhaps this data is being turned into bytes using UTF_8 and then tossed onto a web socket. In that case:

private static final byte[] PREFIX = "Some phrase".getBytes(StandardCharsets.UTF_8);
private static final byte[] SUFFIX = "Some Constant".getBytes(StandardCharsets.UTF_8);

public void concatStringWithCondition(OutputStream out, String condition) {
  out.write(PREFIX);
  out.write(condition.getBytes(StandardCharsets.UTF_8));
  out.write(SUFFIX);
}
  • and check if that outputstream is buffered. If not, make it buffered, that'll help a ton and would completely dwarf the cost of not using string concatenation. If the 'condition' string can get quite large, the above is no good either, you want a CharsetEncoder that encodes straight to the OutputStream, and may even want to replace all that with some ByteBuffer based approach.

Conclusion

  1. Assume performance is never relevant until it is.
  2. IF performance truly must be tackled, strap in, it'll take ages to do it right. Doing it 'wrong' (applying dumb rules of thumb that do not work) isn't useful. Either do it right, or don't do it.
  3. IF you're still on bard, always start with profiler reports and use JMH to gather information.
  4. Be prepared to rewrite the pipeline - change the method signatures, in order to optimize.
  5. That means that micro-optimizing, which usually sacrifices nice abstracted APIs, is actively bad for performance - because changing pipelines is considerably more difficult if all code is micro-optimized, given that this usually comes at the cost of abstraction.

And now the circle is complete: Point 5 shows why the worrying about performance as you are doing in this question is in fact detrimental: It is far too likely that this worry results in you 'optimizing' some code in a way that doesn't actually run faster (because the JVM is a complex beast), and even if it did, it is irrelevant because the code path this code is on is literally only 0.01% or less of the total runtime expenditure, and in the mean time you've made your APIs worse and lack abstraction which would make any actually useful optimization much harder than it needs to be.

But I really want rules of thumb!

Allright, fine. Here are 2 easy rules of thumb to follow that will lead to better performance:

  1. When in rome...

The JVM is an optimising marvel and will run the craziest code quite quickly anyway. However, it does this primarily by being a giant pattern matching machine: It finds recognizable code snippets and rewrites these to the fastest, most carefully tuned to juuust your combination of hardware machine code it can. However, this pattern machine isn't voodoo magic: It's got limited patterns. Which patterns do JVM makers 'ship' with their JVMs? Why, the common patterns, of course. Why include a pattern for exotic code virtually nobody ever writes? Waste of space.

So, write code the way java programmers tend to write it. Which very much means: Do not write crazy code just because you think it might be faster. It'll likely be slower. Just follow the crowd.

Trivial example:

Which one is faster:

List<String> list = new ArrayList<String>();
for (int i = 0; i < 10000; i  ) list.add(someRandomName());

// option 1:

String[] arr = list.toArray(new String[list.size()]);

// option 2:

String[] arr = list.toArray(new String[0]);

You might think, obviously, option 1, right? Option 2 'wastes' a string array, making a 0-length array just to toss it in the garbage right after. But you'd be wrong: Option 2 is in fact faster (if you want an explanation: The JVM recognizes it, and does a hacky move: It makes an new string array that does not need to be initialized with all zeroes first. Normal java code cannot do this (arrays are neccessarily initialized blank, to prevent memory corruption issues), but specifically .toArray(new X[0])? Those pattern matching machines I told you about detect this and replace it with code that just blits the refs straight into a patch of memory without wasting time writing zeroes to it first.

It's a subtle difference that is highly unlikely to matter - it just highlights: Your instincts? They will mislead you every time.

Fortunately, .toArray(new X[0]) is common java code. And easier and shorter. So just write nice, convenient code that looks like how other folks write and you'd have gotten the right answer here. Without having to know such crazy esoterics as having to reason out how the JVM needs to waste time zeroing out that array and how hotspot / pattern matching might possibly eliminate this, thus making it faster. That's just one of 5 million things you'd have to know - and nobody can do that. Thus: Just write java code in simple, common styles.

  1. Algorithmic complexity is a thing hotspot can't fix for you

Given an O(n^3) algorithm fighting an O(log(n) * n^2) algorithm, make n large enough and the second algorithm has to win, that's what big O notation means. The JVM can do a lot of magic but it can pretty much never optimize an algorithm into a faster 'class' of algorithmic complexity. You might be surprised at the size n has to be before algorithmic complexity dominates, but it is acceptable to realize that your algorithm can be fundamentally faster and do the work on rewriting it to this more efficient algorithm even without profiler reports and benchmark harnesses and the like.

  •  Tags:  
  • java
  • Related