Subscribe by Email

Your email:

Multicore Programming Blog

Current Articles | RSS Feed RSS Feed

Threads Ace Meets Cilk++

Posted by John Hart on Tue, Sep 23, 2008
 | Submit to Digg digg it | Submit to Reddit reddit | Add to delicious delicious | Submit to StumbleUpon StumbleUpon | Share on Twitter Twitter 

The Journey Starts

I joined Cilk Arts three months ago and was truly enthused by the opportunity to work with very smart people developing an important technology, Cilk++, at a company with real potential for success. Adding to the excitement was the fact that I had prior experience developing and maintaining multithreaded applications. In fact, in moments of high self-esteem, I felt that I ranked as a "threads ace." Surely, Cilk++ was just another way of writing multithreaded applications; in short, "old wine in new bottles."

Mid-Journey Status

Three months have passed, I've learned a lot, and my expectations of Cilk Arts have been met.

However, while previous multithreading experience has certainly been helpful, the initial impressions vastly oversimplified the situation. By expanding my point of view about threads and parallelism, I've been able to exploit Cilk++ to accelerate performance in new and existing compute-bound programs simply and elegantly. There's a lot of good new wine in the bottles.

The Benefits

Anyone with threads experience, whether with Windows threads, Pthreads, or some other threading model, can quickly learn to use Cilk++, but some changed thinking will make you more productive more quickly. The benefits of learning Cilk++ and using it to eliminate unnecessary serial execution by parallelizing your single-threaded CPU-intensive applications include:

  • Program performance will improve, exploiting application parallelism and the power of multi-core computer systems.
  • Program conversion and development are straight-forward and involve the correct use of just three Cilk++ keywords and several additional Cilk++ features, such as reducers. Your parallelized code will be simpler and more elegant than anything you could write using Pthreads or Windows threads.
  • Programs will be reliable if you use Cilk++ tools along with sound programming practice.
  • The Cilk++ runtime will do all the hard work of scheduling for high performance on the multiple CPU cores.

What Do You Need to Know About Threads? Do You Need to be an Ace?

The more experience you have with multithreaded programming, the better. Even though many Cilk++ users have been successful with little prior threads experience, any experience creating, synchronizing, and managing concurrent tasks will give you a good head start, and your threading knowledge will not be obsolete. In short, you can "make new friends, but keep the old. One is silver, the other is gold."

In my own case, I've worked with threads for about a decade. Four chapters of my book, Windows System Programming, are dedicated to Windows threads, and I've developed and frequently delivered a multi-day Pthreads professional training course. Consulting activities have involved development of large multithreaded systems, and frequently I can help colleagues isolate their bugs involving deadlocks, missed signals (always a risk in Windows), synchronization issues, performance problems, and many other nasty threading problems. Maybe that's not quite "ace level," but I felt ready to tackle Cilk++.  Perhaps my experiences can help other thread-aware software engineers get up to speed with Cilk++ quickly and painlessly, with fun and excitement.

What Cilk++ Does Very Well, and What It Doesn't Try to Do

A summary of Cilk++ and parallelization theory and practice is available in our e-Book How to Survive the Multicore Software Revolution. Here, I'll describe Cilk++ from the point of view of the hypothetical "threads ace."

What Cilk++ Achieves

Cilk++ makes it easy for you to modify existing or new C++ applications to improve performance on systems with multiple CPUs (sometimes known as "shared-memory multiprocessor" or "SMP"). Cilk++ enhances application performance by accelerating serial portions of your code that are inherently parallel.

Cilk++ is easy to use as it uses just three key words which you add to the C++ code to create and synchronize "strands," which are the Cilk++ units of parallel execution (they are very similar to threads, although there are differences). As shown in the e-Book (p. 16), if you were using Pthreads or Windows threads to create parallel computation threads, you'd need to:

  1. Use complex, ugly and high-overhead system calls such as pthread_create() or Windows' _beginthreadex() (if you don't know already, you don't want to know).
  2. Manage system resources, such as handles.
  3. Specify attributes for the threads.
  4. Synchronize on all the thread handles.

Certainly, you can manage threads this way, and I've done it numerous times, often improving application performance. There are other problems with such a threaded solution, however, and Cilk++ solves these problems transparently, giving much better performance than the do-it-yourself multithreading approach. Most importantly:

  1. Memory Usage. Cilk++ uses memory efficiently. Suppose, for example, that you wanted to create 1,000 concurrent threads to perform a computational task; this is not uncommon. Each thread would require its own stack (usually 1MB by default), or 1GB total. Cilk++ creates the thousand strands, if necessary, and bounds memory usage as a linear function of the number of CPUs or active "workers." The Cilk++ runtime is also much more intelligent about strand creation than the brute force thread creation.
  2. Scheduling and Context Switching. Cilk++ work stealing schedules strands to run on available CPUs and switches strands much more quickly than thread creation and context switching and with less chance of page faults and better memory locality. The Implementation of the Cilk-5 Multithreaded Language describes how this works.
  3. Reliability. Cilk++'s Cilkscreen tool will detect race conditions in your parallel code. What is more, you can use built-in or custom reducers to eliminate races on global variables easily and efficiently.
  4. Thread Synchronization. A single cilk_sync is all you need to synchronize a collection of strands. A cilk_for statement does not require any explicit syncrhonization.

In summary, the threads ace needs to learn that simplicity is strength and performing acts of heroic or virtuoso thread management is counter-productive. Instead, understand the parallelization inherent to the application and use Cilk++ keywords to express that parallelization and to manage the details.

What Cilk++ Does Not Try to Do

Cilk++ is targeted towards parallelizing CPU-intensive serial code to improve performance and utilize multiple cores; it is not designed to solve all problems for which you might want to use threads. For example, Cilk++ is not an optimal tool if you are building a producer-consumer system or have tightly coupled threads sending each other messages and commands.

When first encountering Cilk++, I immediately noticed what seemed like missing features, such as events, condition variables, thread pools, and thread-local storage. In fact, these features are simply not required; they address problems not in Cilk++'s domain, or they are not necessary since Cilk++ uses different and more powerful techniques.

Expand Your Point of View (and Other Lessons)

Everything you already know and understand about concurrent operation, synchronization, race conditions, deadlocks, and so on, still applies when you use Cilk++. Nonetheless, when first encountering Cilk++, keep a few points in mind:

  1. Cilk++ is targeted at making your serial applications run faster. It is not appropriate for all multithreaded scenarios, such as transactional workloads, and this limitation is deliberate.
  2. Cilk++ can manage parallel computation more efficiently and reliably than is possible with a do-it-yourself approach.
  3. You can use Cilk++ to parallelize independent operations in loop iterations. You can also easily parallelize recursive, divide and conquer (RDC) algorithm implementations. So, if you are like me and shy away from RDC, think again. RDC expands your choices and is very Cilk++ friendly. The Cilk Arts site has sample sorting and matrix multiply programs that illustrate this point.
  4. Remember everything you learned as a threads ace. In particular, remember the risks and all the ways a program could fail with deadlocks, bad synchronization, and more. Use Cilk++ features to help you avoid these problems.
  5. This was the hardest for me - There is a well-developed theory of parallelism, with key concepts such as work and span. Cilk++ uses these concepts, and, as a software engineer, it is essential to take the time to think about and use these concepts when designing your system. (The e-Book summarizes the theory.)

There are a few old lessons to remember as well, as using Cilk++ to enhance performance pops old lessons right to the top of the stack:

  1. A buggy program will still be buggy under Cilk++. Logic flaws won't go away just because you convert to Cilk++. In fact, the logic flaws may be harder to find. Fortunately, Cilk++ lets you create the serial version (serialization) of your program so that you can debug and test as if the program were still written in C++.
  2. An inefficient program will almost certainly be inefficient in Cilk++. You still need to use good algorithms, give attention to memory access patterns to optimize cache usage, and even perform such mundane tasks as hand optimizing your code (optimizing compilers still miss things).
  3. Think simplicity. If your program is getting complex, maybe you are doing something wrong.
  4. Test and analyze your code.
  5. Be humble (bugs happen), and review the classic multithreaded maxims; I like the last chapter in Programming with POSIX® Threads, where we learn, for example, that you should "never bet on a thread race."

Tags: , , , , , ,

COMMENTS

not to be a jerk - i very much appreciate anybody taking the concurrency bull by the horns and making the world safer and better for us regular grunts - but how much do you think Cilk++ would not have to be done the way it is were it for a different programming language and thus paradigm? i'm just somewhat saddened that there is all this good work on avoiding the bad "shared mutable state" approach to concurrency (see JavaCSP, Erlang, Scala, CSP, Haskell, Clojure -- yes, nothing as 'mainstream' as C++, of course) but then the C++ world, sort of like classical Java, but Java is even recognizing the shared mutable thing is bad with recent concurrency updates, is still suck in that scary place, no? 
 
also - could you make the comment entry text box any smaller?! 
 
ok, ok, i'm being a jerk after all.

posted @ Tuesday, September 23, 2008 3:32 PM by Raoul Duke


Raoul, 
 
Thanks for your comment. Yes, other languages might be better for parallel programming than C++. However, Cilk Arts' mandate is to offer customers the tools they need to multicore-enable their existing applications. In other words, those millions of lines written when C++ was the shiny new language on the block are now "legacy code". It could cost millions of dollars to parallelize C++ legacy code by rewriting it some other, parallel-friendly language. Cilk++ can help you do that job for a fraction of the cost. Don't think that we aren't thinking about other languages that can be parallelized, or about developing our own parallel language for the future. But our research shows that we can server the largest number users in the short-term by using C++ as our base language.

posted @ Tuesday, September 23, 2008 4:27 PM by Pablo Halpern


I know what you mean; nothing wrong with being pragmatic and nothing wrong with helping real world folks and issues and nothing wrong in making actual money because that's how you make actual money. 
 
Good and very interesting and intriguing to hear that there are other things afoot at Cilk perhaps looking at other approaches to concurrency!

posted @ Tuesday, September 23, 2008 4:32 PM by Raoul Duke


Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics

Receive email when someone replies.