A Java myth busted (or is it?)

tl;dr for-loops are not necessary way more performant than stream()

when in JDK 8 stream() was introduced to the java Collections, the immediate reaction was: "they are sooooo
much slower than a normal for-loop".
I tested it and indeed it was slower. by alot. (I remember at some point that oracle admitted that it is slower, but the focus was on functionality and the performance would probably improve in the future. alas I can not find a source for my memory, so let's assume it is wrong.)

what definetly exists are lots of articles about how slow streams are compared to for-loops. e.g. nipafx, who proved it with JMH and Angelika with the compelling argument, that the compiler optimization for loops is too good to be beaten by streams.

some developers took this fact and kept it stored in their brain forever. but streams were introduced 2014. 8 years have passed. how does it look today? is it really as slow as some repeatedly declare? let's find out.

I wrote a set of benchmarks that (to my knowledge) use the correct procedure in JMH.

create the data in a @State object
destroy the result in a blackhole

I let it process loops of 10, 10_000 and 10_000_000 entries. and these are the results:

10 entries

Benchmark                                               Mode  Cnt       Score       Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25  691000.985 ±  5338.170  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25  687244.094 ±  2287.375  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25  620127.959 ± 11149.611  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25  601047.148 ±  6901.828  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25  593137.918 ±  7027.976  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25  583345.516 ±  2706.945  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  752205.384 ±  3155.479  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  753751.877 ±  2618.748  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  732847.868 ±  1268.374  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25  725538.827 ±   859.032  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  725200.238 ±   825.300  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  723650.793 ±  1007.079  ops/s

10_000 entries

Benchmark                                               Mode  Cnt      Score     Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25   4700.019 ±  13.206  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25   4613.177 ±  52.664  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25   4718.937 ± 232.897  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25   1369.088 ±  10.711  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25   1337.578 ±  10.015  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25   1383.158 ±  49.265  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  39043.233 ± 708.907  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  42027.702 ±  91.457  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  40108.355 ± 123.484  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25   9309.883 ±  13.252  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  14033.988 ±  13.011  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  13440.062 ±  98.916  ops/s

10_000_000 entries

Benchmark                                               Mode  Cnt   Score   Error  Units
JmhStreamPerformanceMeasurement.collectFilteredFor     thrpt   25   1.256 ± 0.044  ops/s
JmhStreamPerformanceMeasurement.collectFilteredForGet  thrpt   25   1.240 ± 0.038  ops/s
JmhStreamPerformanceMeasurement.collectFilteredStream  thrpt   25   1.182 ± 0.052  ops/s

JmhStreamPerformanceMeasurement.collectFor             thrpt   25   0.321 ± 0.006  ops/s
JmhStreamPerformanceMeasurement.collectForGet          thrpt   25   0.324 ± 0.005  ops/s
JmhStreamPerformanceMeasurement.collectStream          thrpt   25   0.322 ± 0.006  ops/s

JmhStreamPerformanceMeasurement.easyTaskFor            thrpt   25  39.874 ± 0.326  ops/s
JmhStreamPerformanceMeasurement.easyTaskForGet         thrpt   25  40.546 ± 0.356  ops/s
JmhStreamPerformanceMeasurement.easyTaskStream         thrpt   25  40.263 ± 0.374  ops/s

JmhStreamPerformanceMeasurement.heavyTaskFor           thrpt   25  14.993 ± 0.083  ops/s
JmhStreamPerformanceMeasurement.heavyTaskForGet        thrpt   25  14.795 ± 0.091  ops/s
JmhStreamPerformanceMeasurement.heavyTaskStream        thrpt   25  14.746 ± 0.076  ops/s

Conclusion

the benchmarks ran for 6 hours and ironed out most of the peaks.
the result is in operations per second, so the bigger the better.

in case you're too lazy to look what the benchmarks mean:

For is a normal modern for-loop for (X x:xs) that uses an iterator to run over the entries.
ForGet is a an old-school for (int i = 0; i < xs.size();x++) that calls then get(i) on an ArrayList
Stream is the modern stream() variant.
Collect adds all entities of the list to a set.
CollectFiltered adds only selected values to the set
EasyTask sums up all the entries
HeavyTask does a bit more if else and math stuff with the entries.

My expectations

I would guess that the ForGet benchmarks will be the fastest one of the three, because there will be no Iterator generated and get(i) on ArrayList is basically only a wrapped array access.

I would also assume that the For is faster than the Stream because it only generates one more Iterator instance, while stream() generates a bunch of instances to process the data.

I also assume that this overhead will go away with longer loops. One instance vs 10 instance on 10 million iterations is neglectable.

The result

The data looks almost as expected, except that stream() does not at all look like always the slowest. Feel free to check my benchmark code and maybe I did a mistake.

It looks that with short loops (few entries) the stream is up to 11% slower than a for-loop. but it depends very much on what you execute. the easyTask is the worst. the filteredCollect also not looking good for 10 entries.
but this changes already with 10_000 entries: then filteredCollect with stream is the fastest.

SOOOO I think the difference between the three measured loops is irrelevant.
I don't think that any of them is "way faster".
all three work very different, some have more overhead, but might be more intelligent, but none of them will be the bottleneck in any way.

some numbers seem odd and therefore I ran the benchmark twice to eliminate background processes interfering with the result.

Rule of thumb maybe:

short simple tasks probably better a for-loop.
long complex tasks probably better a stream()

as soon as you have to handle exceptions the for-loop is better anyway, because that is terrible in stream()

There is the one benchmark that looks suspicious: heavyTaskFor with 10_000 entries. I will repeat that again and comment on it. I assume my machine did something weird at that time. ignore it please for now

cheers, thanks for reading.

Blog