The Intel Threading Building Blocks or tbb is a library used to parallelize program in C++. It does not depends on compiler like cilkplus, it is more like OpenMP but with higher level access, and easier to handle with lambda from C++11.
-
initialize task scheduler. The scheduler is expensive to open up and shut down. Always put the initializer at the very beginning and try not to create any scheduler afterwards.
-
three main kinds of loops
parallel_for,parallel_reduce,parallel_scanare three main kinds of loops used.parallel_forThe loop should be independent with each other, the iteration number must be fixed.
#include "tbb/blocked_range.h" class ApplyFunc { private: T* a_; public: void operator() (const blocked_range<size_t>& r) const { for (auto i : r) { Func(a_[i]); } } ApplyFunc(T a[]): a_(a) {} }#include "tbb/parallel_for.h" void ParallelApplyFunc(T a[]) { parallel_for(blocked_range<size_t>(0, n), ApplyFunc(a), auto_partitioner()); }-
parallel_reduceThe loop should be independent with each other, with fixed number iterations. The reduce operation applies whenever we are evaluating something like . This fork-join loop requires a splitting constructor and ajoinmethod to work. Very easy to design. -
parallel_scanThis is used to calculate parallel prefix. Parallel version uses a two-pass method.
-
queue based parallelism For those tree structure,
parallel_whilecould be a choice. Though fetching is serialized, there isaddwhich can make use of the resources by adding more tasks. -
parallel_sortTaken as a replacement ofqsort. -
concurrent containers
Queue,Vector,HashMap, look familiar, just have concurrent access.