Some parallel programming environments require the developer to relearn the fundamentals of programming in order to think in parallel. Cilk++ takes a different approach. One basic design principle of Cilk++ is that Cilk++ programs have serial semantics, that is, a Cilk++ program can be understood (and executed) as a serial program. The Cilk++ keywords were designed to make this serialization look similar to the parallel Cilk++ program.
This simple but powerful principle guides more than just the appearance of Cilk++ programs. It also simplifies the process of developing and testing a Cilk++ program, and it allows Cilk Arts to provide powerful, efficient, and provably correct tools.
(Serial semantics should not be confused with sequential consistency,
which is a concept invented by Leslie Lamport and discussed extensively
in the literature of parallel computing. A parallel computing system
with sequentially consistent memory guarantees that, in Lamport's
words, "... the result of any execution is the same as if the
operations of all the processors were executed in some sequential
order, and the operations of each individual processor appear in this
sequence in the order specified by its program." In contrast,
serial semantics means that the program can be executed using a single
thread of control as an ordinary serial program - one does not need a
conceptual parallel model to understand -- or execute -- the program.)
Cilk++ Serial Semantics
First, let's take a closer look at how Cilk++ provides serial semantics.
Cilk++ provides three new keywords: cilk_spawn, cilk_for, and cilk_sync. Each of these has an intuitive serial interpretation:
|
Keyword
|
Parallel
meaning
|
Serial
meaning
|
|
cilk_spawn
|
Allow
a called function to run in parallel with its caller
|
Call a function normally, where the
caller waits for the called function to return before resuming
|
|
cilk_sync
|
Wait
for parallel activity to complete
|
Do
nothing (there is no parallel activity)
|
|
cilk_for
|
Allow multiple iterations of the body
of a loop to run in parallel
|
Execute
a standard for loop in which
loop iterations execute one at a time
|
As you can see, a Cilk++ program can be converted to a serial program quite easily: simply replace cilk_for with for, and delete the cilk_spawn and cilk_sync keywords. The Cilk++ development system provides a compile-time switch to build the "serialization" of a Cilk++ program. Using Cilk++ for Linux, the option [-fcilk-stub] is used with cilk++ to "stub out" Cilk features. Using Cilk++ for Windows, run cilkpp with the [/cilkp cpp] option.
Okay, I hear you saying, "Fine, your fancy parallel program can devolve into a serial program. So what? If I wanted the serial version, I would have just used C++ in the first place!" Glad you raised the issue! A parallel programming model with serial semantics maintains four real advantages over program models without serial semantics.
- The equivalent serial C++ program can easily be debugged and analyzed using existing development tools. This is an ideal way to debug a parallel program without worrying about parallel interaction, and without needing to consider multiple stacks and program counters. Also, as vanilla C++, all symbol and debug information is available to standard system and third-party tools. Tools that are easy to run on the serialization but might require custom versions to operate on the parallel program include performance profilers, code coverage tools, security analysis tools, memory leak checkers, and any other tools that work with C++ programs.
- The serial semantics of Cilk++ programs allows Cilk Arts to build better tools that analyze the performance and correctness of Cilk++ programs, and these tools offer strong guarantees about the program. For example, our Cilkscreen race detector runs the test program in a serialized mode on a single processor. With information about the parallel structure of the program (i.e., where the spawns and syncs occur), Cilkscreen can analyze all possible schedules of the program that might execute on any number of processors. In other words, Cilkscreen can detect whether race conditions exist that could manifest under any possible execution of the program. The serial semantics of Cilk++ permits Cilkscreen to analyze the parallel program while running in a single worker (the Cilk++ name for a thread of execution) on a single processor.
- The serial semantics of Cilk++ provides a real advantage to program testing and quality assurance. Cilk++ is designed to ensure that parallel programs written in Cilk++ can be deterministic, reliably producing results identical to the serialization. This guarantee makes it easy to leverage traditional testing tools that assume that multiple runs of a program always return the same result if given the same input. The serial semantics of Cilk++ eliminates the need to consider timing and scheduling considerations when evaluating program correctness, and it makes it simple to compare the output of multiple program runs for consistency. Repeatability is one of the key properties that make Cilk++ programs testable and reliable.
- Serial semantics provides high performance while allowing you to design your program without specifying the number of processors on which the program will execute. The Cilk++ runtime scheduler takes care of efficiently load balancing your program across however many processors are available. When a portion of your parallel computation executes on a single processor, Cilk++ can execute it just like ordinary C++ code, taking full advantage of all the compiler optimizations and runtime efficiencies that a good C++ system offers. By starting from good single-core performance, Cilk++ ensures that a program with sufficient parallelism gets good speedup whether it is run on a large number of processors or just a few.
What's the Catch?
Perhaps this all sounds too good to be true - and maybe now you're thinking, "Okay, so what‘s the catch?" All right, there are a few things I've glossed over. For one, since there are some kinds of parallelism that actually don't have serial semantics, you can't write programs using these parallelization strategies in Cilk++. For example, Cilk++ doesn't support producer/consumer, software pipelining, or message passing. Nevertheless, although we're giving something up to have to serial semantics, our experience is that less is more. The four advantages outlined above readily make up for any loss in generality.
There's a second catch, however. Note that in the third point above, I said that Cilk++ programs can be deterministic, not that they are deterministic. How so? Well, the output of a program with a determinacy race (see "Are Determinacy Races Lurking in Your Multicore Application") can depend on how the program is scheduled on multiple processors, and the race condition that leads to this nondeterministic behavior is usually a bug. In most programming environments, you'd be stuck at this point trying to find the race bug by laborious program inspection. In contrast, in the Cilk++ programming environment, Cilkscreen can help you find and eliminate these races, giving you your determinism back. Thus, this "catch," though problematic, can be dealt with effectively precisely because of serial semantics: in effect, Cilkscreen uses the serialization as a benchmark for correctness of a parallel execution.
Sometimes, however, a Cilk++ program is intentionally nondeterministic. For example, some search programs use a concurrent hash table to "remember" the results of intermediate subsearches so that the subsearch doesn't have to be reexecuted if it reoccurs. In this context, each slot of the hash table may contain a mutual-exclusion lock to allow parallel branches of a computation to safely access and update the items stored in the slot. Cilkscreen correctly ensures that races are not declared on operations protected by the same lock.
In practice, however, we have found that most Cilk++ programs require few, if any, locks. In particular, Cilk++'s parallel control constructs obviate the need for the locks that other parallel programming models require for interthread communication and synchronization. Moreover, for many common situations where locking would seem to be necessary to avoid a race, Cilk++ provides hyperobjects, a novel data structure that can resolve races on global variables without sacrificing performance or determinism. By avoiding locks and using hyperobjects, Cilk++ sidesteps many performance anomalies caused by locks, such as lock contention, which can slow down parallel programs significantly, and deadlock, which may cause your application to freeze midexecution.
The Bottom Line
To sum up, Cilk++'s serial semantics provide three key benefits - performance, reliability, and productivity:
- Scalable performance comes from a model that doesn't rely on the programmer knowing the number of available processors in advance. Developers can use traditional serial performance tuning tools to find hotspots and improve both serial and parallel performance.
- Reliability is enhanced through the testing advantages of repeatability and determinism, as well as the ability to build an efficient and provably correct race detector that can compare the parallel semantics of the program with the serial semantics provided by the program's serialization.
- Finally, programmers can be more productive when they use familiar paradigms for new development, and of course, legacy serial code can be converted into Cilk++ easily, because it does not need to be rewritten to adopt a completely different programming model such as data-parallel or message-passing styles.