The Three Cilk++ Keywords

Programming with Cilk++ is simple - consisting of C++ with the addition of three keywords to indicate parallelism and synchronization. A C++ programmer uses the keywords to expose the parallelism in a program, and the Cilk runtime system takes the responsibility of scheduling the procedures efficiently:


cilk_for   ( T v = begin; v < end; v++)
      {
      statement1;
      statement2;
      ...
      }

cilk_for

    The cilk_for keyword enables a loop's iterations to execute in parallel. The loop index can be an arbitrary C++ random-access iterator whose difference type is an integer.

    cilk_spawn 

    int fib (int n) {
          if (n<2) return (n);
          else {
            int x,y;
            x = cilk_spawn fib(n-1);
            y = fib(n-2);
            cilk_sync;
            return (x+y);
            }
       }
    The cilk_spawn keyword allows the programmer to expose parallelism by identifying C++ function (or method) calls that can take place in parallel. As the name implies, the named child function is spawned, and can execute in parallel with the parent caller. But unlike the case of serial execution, where the parent is not resumed until after its child returns, in the case of a cilk_spawn, the parent can continue to execute in parallel with the child. Indeed, the parent can continue to spawn off children, producing a high degree of parallelism. The Cilk++ scheduler takes the responsibility of scheduling the spawned functions.

    cilk_sync

    A cilk_sync statement must be executed in order for a function to safely use the return values of the children it has spawned. If all of its children have not completed when it executes a cilk_sync, the function suspends and does not resume until all of its children have completed. The sync statement is a local "barrier," not a global one as, for example, is sometimes used in message-passing programming. A cilk_sync only waits for the spawned children of a function to complete, not for all currently executing parallel functions. When all its children return, execution of the function resumes at the point immediately following the cilk_sync statement. As an aid to programmers, the Cilk++ compiler inserts an implicit cilk_sync before every return, if it is not present already. As a consequence, a function never terminates while it has outstanding children.

    A cilk_spawn/cilk_sync is over 450 times faster than a Pthread create/exit - less than 4 times slower than an ordinary C function call. As a result, Cilk++ overhead typically measures just a couple percent on a single processor.

    The addition of two keywords parallelizes the core of Quicksort, a recursive divide-and-conquer list sorting algorithm.