Home > Mobile >  Does Thread.sleep have no effect in Stream processing?
Does Thread.sleep have no effect in Stream processing?

Time:12-31

The following program is from OCP Study Guide by Jeanne Boyarsky and Scott Selikoff:

import java.util.*;

class WhaleDataCalculator {
    public int processRecord(int input) {
        try {
            Thread.sleep(10);
        } catch (InterruptedException e) {
            // Handle interrupted exception
        }
        return input   1;
    }

    public void processAllData(List<Integer> data) {
        data.stream().map(a -> processRecord(a)).count();
    }

    public static void main(String[] args) {
        WhaleDataCalculator calculator = new WhaleDataCalculator();
        // Define the data
        List<Integer> data = new ArrayList<Integer>();
        for (int i = 0; i < 4000; i  )
            data.add(i);
        // Process the data
        long start = System.currentTimeMillis();
        calculator.processAllData(data);
        double time = (System.currentTimeMillis() - start) / 1000.0;
        // Report results
        System.out.println("\nTasks completed in: "   time   " seconds");
    }
}

The authors claim

Given that there are 4,000 records, and each record takes 10 milliseconds to process, by using a serial stream(), the results will take approximately 40 seconds to complete this task.

However, when I am running this in my system, it is taking between 0.006 seconds to 0.009 seconds on every run.

Where is the discrepancy?

CodePudding user response:

That's because of the use of count, which performs a trick in later Java versions.

Since you're only interested in the number of elements, count will try to get the size directly from the source, and will skip most other operations. This is possible because you are only doing a map and not, for example, a filter, so the number of elements will not change.

If you add peek(System.out::println), you'll see no output as well.

If you call forEach instead of count, running the code will probably take 40 seconds.

CodePudding user response:

The call of .map(a -> processRecord(a)) did not run at all, the reason is because you are running this program with a JDK version more than 1.8.

Let's take this example to make it easy to understand:

long number = Stream.of("x", "x", "x").map(e -> {
            System.out.println("Hello");
            return e;
        }).count();
        
        System.out.println(number);

Try to run it using a JDK 1.8 , after that run it using a JDK 11.

In java 8, count() acts as a terminal operation, all the intermediate operations(map method here) will be executed, the map operation will be executed and will print the hello message. you will get this output :

Hello
Hello
Hello
3

In greater than 1.8 Java versions, 11 as example here, Java can determine the number of elements of the stream directly, if there is no intermediate operation that can change the number of the elements of the stream (example : filter() ), no intermediate method will be executed, just the count method will be executed, so you will not see any hello message but the number of the element of this stream will be calculated and you can use it. your output will be like that :

3

If you like to see the hello message in the Java versions greater than 1.8, you should add an intermediate operation to your stream pipeline that can change the number of element of the stream, let's add the filter method to the pipeline and see the output on java 11 :

long number = Stream.of("x", "x", "x").map(e -> {
            System.out.println("Hello");
            return e;
        }).filter(element-> element.equals("x")).count();
        
        System.out.println(number);

Here the output :

Hello
Hello
Hello
3

CodePudding user response:

Since Java 9 operation count() has been optimized in such so that if during the initialization of the stream (when stages of the pipeline are being chained) it turns out that there are no operations which can change the number of elements in the stream source allows evaluating the number of elements it contains, then count() does not triggers the execution of the pipeline, but instead ask the source "how many of these guys do you have?" and immediately returns the value.

Here's a quote from the documentation:

The number of elements covered by the stream source, a List, is known and the intermediate operation, peek, does not inject into or remove elements from the stream (as may be the case for flatMap or filter operations). Thus the count is the size of the List and there is no need to execute the pipeline and, as a side-effect, print out the list elements.

And since you're not using the value of count and just need to fire a side-effect on each element of the list, then you don't need a Stream. Instead, you can use Iterable.forEach():

public void processAllData(List<Integer> data) {
    data.forEach(a -> processRecord(a));
}
  • Related