The Intel Threading Building Blocks
or tbb
is a library used to parallelize program in C++
. It does not depends on compiler like cilkplus
, it is more like OpenMP
but with higher level access, and easier to handle with lambda from C++11
.
-
initialize task scheduler. The scheduler is expensive to open up and shut down. Always put the initializer at the very beginning and try not to create any scheduler afterwards.
-
three main kinds of loops
parallel_for
,parallel_reduce
,parallel_scan
are three main kinds of loops used.parallel_for
The loop should be independent with each other, the iteration number must be fixed.
#include "tbb/blocked_range.h" class ApplyFunc { private: T* a_; public: void operator() (const blocked_range<size_t>& r) const { for (auto i : r) { Func(a_[i]); } } ApplyFunc(T a[]): a_(a) {} }
#include "tbb/parallel_for.h" void ParallelApplyFunc(T a[]) { parallel_for(blocked_range<size_t>(0, n), ApplyFunc(a), auto_partitioner()); }
-
parallel_reduce
The loop should be independent with each other, with fixed number iterations. The reduce operation applies whenever we are evaluating something like . This fork-join loop requires a splitting constructor and ajoin
method to work. Very easy to design. -
parallel_scan
This is used to calculate parallel prefix. Parallel version uses a two-pass method.
-
queue based parallelism For those tree structure,
parallel_while
could be a choice. Though fetching is serialized, there isadd
which can make use of the resources by adding more tasks. -
parallel_sort
Taken as a replacement ofqsort
. -
concurrent containers
Queue
,Vector
,HashMap
, look familiar, just have concurrent access.