Duncan McCallum, CEO of Cilk Arts, discusses the multicore programming challenge facing the industry, and the mission of Cilk Arts
|
Cilk Arts serves companies who develop performance-sensitive, CPU-bound applications for multicore processors. For our customers, it is a competitive imperative to exploit all of the performance available in multicore processors. They face three challenges in doing so - focused around development time, software reliability, and performance.
Development Time
The Multicore Challenge
Developing multi-threaded software is dramatically more complex than
developing serial code. This complexity requires organizations to
acquire new programming skills - forcing retraining or retooling of
development teams. With any of the alternatives to Cilk, a
legacy application must be redesigned before it can be multicore-enabled. These factors put enormous pressure on development schedules
and introduce risk.
The Cilk++ Solution
The Cilk++ keywords are simple enough to learn in less than a day. As a result, any of a company's programmers can quickly become "multicore" developers using Cilk++. With Cilk++ you don't need to recruit new programmers or train existing programmers in a complicated new parallel programming model.
Use of Cilk++ requires little or no redesign of the original serial code, saving months to years of development time and dramatically reducing schedule risk. A Cilk++ program retains the serial semantics of the original code. The keywords can also easily be compiled out - allowing you to debug your application with your existing serial tools your programmers are familiar with. Furthermore, customers can apply Cilk++ incrementally to their application - achieving rapid proof-of-benefit.
Software Reliability
The Multicore Challenge
When parallelism is introduced into an application, that application becomes vulnerable to "race conditions". A race condition occurs when concurrent software tasks access a shared memory location and at least one of the tasks stores a value into the location. Depending on the scheduling of the tasks, the software may behave differently. The result is software flaws that are nondeterministic and very difficult to detect during testing.
The Cilk++ Solution
Because a Cilk++ program retains the serial semantics of the original program, the debug/test infrastructure already in place to test the serial version of an application remains unchanged. Since both the serial code and the serial regression tests are identical to the original, the serial correctness of a program is unchanged as well.
Using the Cilk++ race detector flags race conditions, assuring the parallel correctness of a program. This allows you to build multi-core enabled code this is as reliable as the original serial application.
Performance
The Multicore Challenge
Building applications that fully utilize all the cores in a CMP is difficult with existing approaches. An application must be tuned to run well on a predetermined number of processor cores.
This dependence on the number of processors means that applications must be modified for each successive processor generation. It also requires that development organizations must support multiple versions of an application for it to run on a heterogeneous collection of hardware platforms.
The Cilk++ Solution
Best-in-class Performance: The Cilk++ runtime library delivers performance equal to or better than the best hand-tuned codes in a fraction of the development time.
Linear Scaling as cores are added: The Cilk++ scheduler delivers near-perfect linear speedup as cores are added, measured on programs with sufficient parallelism at 3.98x on 4 cores. In addition, the scheduler is indifferent to the number of processors being scheduled - it adapts dynamically.
Minimal Overhead on a Single Core: Cilk overheads on a single-core processor are neglible - often less than 2%. The result is that a Cilk application is not dependent on the number of cores available, and will run well even on a single-core machine. A customer can write one Cilk application and it will run optimally on any platform. Applications are future-proofed against the increasing core-count expected in future microprocessors.