Intel Parallel Studio 2011 Helps Out with the Hard Work

Intel Parallel Studio 2011 helps developers write multithreaded code in C++ that targets multiple processor cores.

When I reviewed the first version of Intel's Parallel Studio last year, it was already a solid tool for enhancing applications to take advantage of parallel processing across multiple CPU cores. The latest version of the product, which became available following last month's Intel Developer Conference, has grown better still at enabling developers to make the most of today's hardware.

Parallel Studio is actually a composite of four different products: Parallel Composer, Parallel Advisor, Parallel Inspector and Parallel Amplifier. And similar to the previous version, it's a huge plugin into Visual Studio (versions 2005, 2008 or 2010). The two main areas of improvement over last year's iteration of the product are in its Parallel Composer, which adds support for new language extensions in the form of Cilk Plus, and in its enhanced Parallel Advisor, which serves as sort of an automated parallel programming tutor.

At this year's IDF, I spent a bit of time meeting with the folks at Intel and speaking with them one on one, and one of the things they told me was that they're hoping people will be able to use this tool to "play with" parallelism and learn about it. Parallel Advisor certainly does just that. But it's not just for learning; it's a professional-grade tool that even parallel pros can use.

Intel Parallel Studio 2011 is priced at $799 for the full product, with individual components (Parallel Composer, Parallel Advisor, Parallel Inspector, Parallel Amplifier) available for $399 apiece. Parallel Studio is also available in a free 30-day trial version. The product works with Visual Studio 2005, 2008 or 2010 with all but the Express editions of Visual Studio.

Parallel Composer

Parallel Composer is the coding aspect of Parallel Studio, and consists of extensions to the C++ language and a set of libraries that simplify writing parallel code. The improvements to Parallel Composer are the addition of Cilk Plus, the new version 3.0 of Threading Building Blocks and the new (but still beta) Array Building Blocks. Together these libraries and extensions make it considerably easier to write parallel code.

Cilk was originally a language created at MIT, and it was based on C. It included constructs meant for parallel programming. But in July of 2009, a company called Cilk Arts, which was the main company researching and furthering Cilk on a commercial basis, was purchased by Intel. Intel then began working Cilk into its C++ compiler, with the results being Cilk Plus, a set of extensions to C++. And so now, in addition to the original support for the OpenMP C++ extensions, the compiler allows for Cilk Plus code. And Cilk Plus code is actually very easy to write. Here's an example line of code from the samples:

cilk_for(int i=0; i<size; i++) {

This is for a loop that runs in parallel, using the multiple cores when possible.

Cilk Plus actually consists only of three additional keywords added to the C++ language: cilk_for, cilk_spawn and cilk_sync. The cilk_spawn keyword basically spawns a function as a separate thread that runs in parallel to the current thread. That's pretty easy. And cilk_sync waits for called threads to complete.

Of course, since these keywords are built into the C++ language that the Intel C++ compiler recognizes, the code you write won't port to other compilers. That may or may not be a problem for you, depending on your needs.

The Intel C++ Compiler that ships with Parallel Studio is considered part of Parallel Composer. In addition to the compiler, Parallel Composer also includes Parallel Building Blocks, which is a set of two template libraries that aid in writing parallel code. The main reason for this inclusion is that the standard C++ library (which includes all the usual classes like std::map and so on) isn't thread-safe. You can, technically, carefully write code with the standard library that is thread-safe, but it's a lot of work. The advantage to the Parallel Building Blocks, however, is that you don't need to work so hard. The entire library is automatically thread-safe and includes a great amount of code that takes out the headaches of worrying about who is doing what and when.

Parallel Building Blocks is actually two distinct libraries: Threading Building Blocks and Array Building Blocks. Threading Building Blocks isn't new, but the version that ships with Parallel Studio 2011 is new (version 3.0). Array Building Blocks is new; it's an array library that greatly simplifies threading with data structures. (And at the time of this writing, the version of Array Building Blocks shipping with Parallel Studio 2011 is technically still a beta version, although it's already quite stable.)

Parallel Advisor 2011

Writing parallel code isn't always easy, and if you already have a large amount of code that isn't parallelized, it can be a real headache trying to figure out how you can parallelize your code. That's where Parallel Advisor 2011 comes in: It analyzes your program and advises you on where to add parallelization.

When I reviewed the first version of Parallel Studio, I mentioned that it included a product called "Parallel Advisor Lite." At the time, it wasn't exactly clear why the word "Lite" was included in the title. I'm still not sure of the exact answer, but I'm guessing it was because the company was anticipating the release of the full product with the next version-which is where we are now. The full version is called Parallel Advisor 2011, and it's a huge improvement over the Lite version of last year.

Parallel Advisor includes a window in Visual Studio that's a workflow for analyzing and getting advice on where to parallelize your code. The process begins with the Survey Target stage, in which the Parallel Advisor runs your program and analyzes it while it's running to determine where parallelization would fit.

Once the analysis is complete, you can use the tool to add annotations to your program in proposed places where parallelization would work. This doesn't actually parallelize the code; instead, it tells Parallel Advisor to monitor the following code in the next step to check for possible parallelization.

After the annotations are added, you rebuild your program, and then run it again. The code is still in serial-not parallel-but Parallel Advisor is now monitoring specifically these places to try to determine what kind of performance increase you'll get if a given section is parallelized.

Once the advising is complete, you know where you should add parallelization. You can then add the actual parallel code using Cilk Plus and the Parallel Building Blocks. The product runs a correctness check to help suss out potential data-sharing issues introduced by the code changes.

I tried out all these steps on one of the several sample programs that comes with Parallel Studio and did see a definite increase in performance. The computer I was using only had a dual-core processor, but I could see the load from the application spread out among the pair of cores in Windows' task manager.

While it isn't true "artificial intelligence," Parallel Advisor is certainly a step toward exactly that. You don't have to be a total parallelization guru to be able to analyze your code and learn how to add parallelization to it. Parallel Advisor does a huge amount of the hard work for you; it's almost like having one of Intel's parallel gurus sitting right there next to you.