
You need best-in-class performance - but not just on the current generation of 4-core machines. You don't want to rewrite your code each time the number of cores increases. Moreover, you would like to use a single binary for all your customers, including those who still run 1- or 2-core machines. As a result, you need linear scaling up and down, and minimal overhead on a single processor.
The Cilk++ Runtime System enables a Cilk++ program to dynamically and automatically exploit an arbitrary number of available processor cores, and, with sufficient parallelism and memory bandwidth, the Cilk++ Runtime Library delivers near-perfect linear speed-up as the number of cores increases. On a single core, typical programs run with negligible overhead (less than 2%).