Java 的 fork-join 框架实例备忘
Java 7 首次引入了 fork/join 框架,但一直未曾直接尝试. 而且基本上也很少在实际项目中直接写 fork-join 的代码,在我们使用第三方组件时倒是间接会接触到 fork/join 框架。譬如 Akka 的 fork-join-executor, sbt 执行测试用例时也是默认 fork/join 并发执行。fork-join 可以帮助我们把计算任务粒度细化,并更有效的利用多 CPU 内核。
fork-join 与 map-reduce 有些相妨,在 Java 7 时代我其实是忽视了它的存在。目今正在了解 Java 8 的 parallelStream 时,因为它的底层实现也是 fork/join, 所以有兴致去稍加体验一下。fork/join 的算法简单来讲就是递归对半去细化计算任务,及到不能细化时由多内核(线程)去计算被拆分的任务,最后反方向把结果汇总。
下面是从 《Java 8 IN ACTION》中截的一个说明 fork/join 的处理过程
以下是代码演示实现,更有助于理解 fork/join 是如何工作的
fork/join 的任务要继承算 RecursiveTask<T>,并在 compute() 方法同时决定任务的细化粒度和如何合并结果.
leftTask.fork(); 将把任务委派给新的线程执行
rightTask.compute(); 将重用本线程完成进一步任务,因为没必要把当前线程释放再取用. 写成 rightTask.fork().join(); 也能出正确的结果
注: 以上代码只是一个对 fork/join 过程的演示,在该代码的 fork/join 并未能提升计算性能。因为每个计算任务并不耗时,拆分任务(fork) 和合并计算结果(join) ,以及创建使用多线程这些辅助过程本身都重于实际的计算任务。所以 fork/join 的目的是要拆分耗时的任务,充分发挥多内核的优势来更有效的完成整体计算。
看下输出结果:
fork/join 的关键就是如何拆分任务和怎么把每个计算结果合并。
未例中可以启用注释掉的代码
看起来似乎是完全一样的,但执行后的输出却令我有些迷惑
[版权声明]
本文采用 署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 进行许可。
fork-join 与 map-reduce 有些相妨,在 Java 7 时代我其实是忽视了它的存在。目今正在了解 Java 8 的 parallelStream 时,因为它的底层实现也是 fork/join, 所以有兴致去稍加体验一下。fork/join 的算法简单来讲就是递归对半去细化计算任务,及到不能细化时由多内核(线程)去计算被拆分的任务,最后反方向把结果汇总。
下面是从 《Java 8 IN ACTION》中截的一个说明 fork/join 的处理过程
以下是代码演示实现,更有助于理解 fork/join 是如何工作的 1import java.util.concurrent.ForkJoinPool;
2import java.util.concurrent.ForkJoinTask;
3import java.util.concurrent.RecursiveTask;
4import java.util.stream.LongStream;
5
6public class ForkJoinDemo {
7
8 public static void main(String[] args) {
9 long[] numbers = LongStream.rangeClosed(1, 100000L).toArray();
10 ForkJoinTask<Long> task = new ForkJoinSumCalculator(numbers);
11 Long result = new ForkJoinPool().invoke(task);
12 System.out.printf("Final result: %s, CPU cores: %s\n", result,
13 Runtime.getRuntime().availableProcessors());
14 }
15
16}
17
18
19class ForkJoinSumCalculator extends RecursiveTask<Long> {
20
21 private final long[] numbers;
22 private final int start;
23 private final int end;
24
25 public static final long THRESHOLD = 10_000L;
26
27 public ForkJoinSumCalculator(long[] numbers) {
28 this(numbers, 0, numbers.length);
29 }
30
31 private ForkJoinSumCalculator(long[] numbers, int start, int end) {
32 this.numbers = numbers;
33 this.start = start;
34 this.end = end;
35 }
36
37 @Override
38 protected Long compute() {
39 int length = end - start;
40 if (length <= THRESHOLD) {
41 return computeSequentially();
42 }
43
44 //fork schedules task on new thread, compute reuses the same thread
45// return new ForkJoinSumCalculator(numbers, start, start + length / 2).fork().join()
46// + new ForkJoinSumCalculator(numbers, start + length / 2, end).compute();
47
48 ForkJoinSumCalculator leftTask = new ForkJoinSumCalculator(numbers, start, start + length / 2);
49 leftTask.fork();
50
51 ForkJoinSumCalculator rightTask = new ForkJoinSumCalculator(numbers, start + length / 2, end);
52
53 Long rightResult = rightTask.compute();
54 Long leftResult = leftTask.join();
55
56 return leftResult + rightResult;
57 }
58
59 private long computeSequentially() {
60 System.out.printf("Summation from %s to %s, calculated by thread %s\n", start, (end - 1), Thread.currentThread().getName());
61 long sum = 0;
62 for (int i = start; i < end; i++) {
63 sum += numbers[i];
64 }
65 return sum;
66 }
67}fork/join 的任务要继承算 RecursiveTask<T>,并在 compute() 方法同时决定任务的细化粒度和如何合并结果.
leftTask.fork(); 将把任务委派给新的线程执行
rightTask.compute(); 将重用本线程完成进一步任务,因为没必要把当前线程释放再取用. 写成 rightTask.fork().join(); 也能出正确的结果
注: 以上代码只是一个对 fork/join 过程的演示,在该代码的 fork/join 并未能提升计算性能。因为每个计算任务并不耗时,拆分任务(fork) 和合并计算结果(join) ,以及创建使用多线程这些辅助过程本身都重于实际的计算任务。所以 fork/join 的目的是要拆分耗时的任务,充分发挥多内核的优势来更有效的完成整体计算。
看下输出结果:
Summation from 18750 to 24999, calculated by thread ForkJoinPool-1-worker-4fork/join 使用的是 ForkJoinPool 线程池,默认数量为机器的逻辑内核数即 Runtime.getRuntime().availableProcessors() 的值,我的机器是 8 核的。从输出中看到了任务被分拆为每次计算 10000 个数字,分别于线程池中的 ForkJoinPool-1-workerX(0-7) 来执行。
Summation from 6250 to 12499, calculated by thread ForkJoinPool-1-worker-0
Summation from 93750 to 99999, calculated by thread ForkJoinPool-1-worker-1
Summation from 87500 to 93749, calculated by thread ForkJoinPool-1-worker-7
Summation from 56250 to 62499, calculated by thread ForkJoinPool-1-worker-6
Summation from 43750 to 49999, calculated by thread ForkJoinPool-1-worker-2
Summation from 81250 to 87499, calculated by thread ForkJoinPool-1-worker-5
Summation from 68750 to 74999, calculated by thread ForkJoinPool-1-worker-3
Summation from 37500 to 43749, calculated by thread ForkJoinPool-1-worker-2
Summation from 75000 to 81249, calculated by thread ForkJoinPool-1-worker-1
Summation from 50000 to 56249, calculated by thread ForkJoinPool-1-worker-7
Summation from 0 to 6249, calculated by thread ForkJoinPool-1-worker-0
Summation from 12500 to 18749, calculated by thread ForkJoinPool-1-worker-4
Summation from 25000 to 31249, calculated by thread ForkJoinPool-1-worker-5
Summation from 31250 to 37499, calculated by thread ForkJoinPool-1-worker-2
Summation from 62500 to 68749, calculated by thread ForkJoinPool-1-worker-3
Final result: 5000050000, CPU cores: 8
fork/join 的关键就是如何拆分任务和怎么把每个计算结果合并。
未例中可以启用注释掉的代码
1return new ForkJoinSumCalculator(numbers, start, start + length / 2).fork().join()
2 + new ForkJoinSumCalculator(numbers, start + length / 2, end).compute();看起来似乎是完全一样的,但执行后的输出却令我有些迷惑
Summation from 0 to 6249, calculated by thread ForkJoinPool-1-worker-3基本只有 2-3 个线程参与计算,而不像前面的所有线程,这和顺序有关系了,必须是先 fork, compute, 再 join, 即基本过程是
Summation from 6250 to 12499, calculated by thread ForkJoinPool-1-worker-1
Summation from 12500 to 18749, calculated by thread ForkJoinPool-1-worker-2
Summation from 18750 to 24999, calculated by thread ForkJoinPool-1-worker-2
Summation from 25000 to 31249, calculated by thread ForkJoinPool-1-worker-2
Summation from 31250 to 37499, calculated by thread ForkJoinPool-1-worker-1
Summation from 37500 to 43749, calculated by thread ForkJoinPool-1-worker-1
Summation from 43750 to 49999, calculated by thread ForkJoinPool-1-worker-1
Summation from 50000 to 56249, calculated by thread ForkJoinPool-1-worker-1
Summation from 56250 to 62499, calculated by thread ForkJoinPool-1-worker-1
Summation from 62500 to 68749, calculated by thread ForkJoinPool-1-worker-1
Summation from 68750 to 74999, calculated by thread ForkJoinPool-1-worker-1
Summation from 75000 to 81249, calculated by thread ForkJoinPool-1-worker-1
Summation from 81250 to 87499, calculated by thread ForkJoinPool-1-worker-1
Summation from 87500 to 93749, calculated by thread ForkJoinPool-1-worker-1
Summation from 93750 to 99999, calculated by thread ForkJoinPool-1-worker-1
Final result: 5000050000, CPU cores: 8
leftTask.fork();永久链接 https://yanbin.blog/java-fork-join-framework-memo/, 来自 隔叶黄莺 Yanbin's Blog
rightTask.compute();
leftTask.join();
[版权声明]
本文采用 署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 进行许可。