The Intel Threading Building Blocks or tbb is a library used to parallelize program in C++. It does not depends on compiler like cilkplus, it is more like OpenMP but with higher level access, and easier to handle with lambda from C++11.

  • initialize task scheduler. The scheduler is expensive to open up and shut down. Always put the initializer at the very beginning and try not to create any scheduler afterwards.

  • three main kinds of loops parallel_for, parallel_reduce, parallel_scan are three main kinds of loops used.

    • parallel_for The loop should be independent with each other, the iteration number must be fixed.
    #include "tbb/blocked_range.h"
     
    class ApplyFunc {
    private:
      T* a_;
    public:
      void operator() (const blocked_range<size_t>& r) const {
        for (auto i : r) {
          Func(a_[i]);
        }
      }
      ApplyFunc(T a[]): a_(a) {}
    }
    #include "tbb/parallel_for.h"
     
    void ParallelApplyFunc(T a[]) {
      parallel_for(blocked_range<size_t>(0, n), ApplyFunc(a), auto_partitioner());
    }
    • parallel_reduce The loop should be independent with each other, with fixed number iterations. The reduce operation applies whenever we are evaluating something like . This fork-join loop requires a splitting constructor and a join method to work. Very easy to design.

    • parallel_scan This is used to calculate parallel prefix. Parallel version uses a two-pass method.

  • queue based parallelism For those tree structure, parallel_while could be a choice. Though fetching is serialized, there is add which can make use of the resources by adding more tasks.

  • parallel_sort Taken as a replacement of qsort.

  • concurrent containers Queue, Vector, HashMap, look familiar, just have concurrent access.