Optimizing Apps for Multicore Use
REVIEW: Intel Parallel Studio Helps Developers Exploit Multiple Cores
Parallel programming is not easy. I remember back in my computer science college courses years ago studying the problems involved in writing algorithms that make use of parallel processors. This was in the late 1980s, when parallel programming was basically understood but the tools to accomplish it were lacking. Multiprocessor computers were rare back then, so such tools weren't very important to everyday programmers.
Today, however, processors with multiple cores are commonplace, which has created a need for new tools that make parallel programming easier.
To help programmers write code that makes use of multiple cores, Intel has released Intel Parallel Studio, which works hand-in-hand with Microsoft Visual Studio (and supports only Microsoft Windows, XP or higher).
Intel Parallel Studio consists of the following components:
-Parallel Inspector, an analysis tool that will locate threading and memory problems;
-Parallel Composer, the set of tools that includes the Intel C++ compiler and associated libraries;
-Parallel Amplifier, an analysis tool that analyzes the performance of your program; and
-Parallel Advisor Lite, a tool that guides you through several steps to help prepare your program for parallelism. (Technically, Parallel Advisor Lite isn't part of the Intel Parallel Studio but is a separate download available for free at http://whatif.intel.com.)
I tested Intel Parallel Studio and found no problems-despite looking for some. It's a superior product and definitely something a C++ programmer should check out.
Applications developed with Intel Parallel Studio will be forward-compatible with future Intel multicore processors, including "Larrabee," a general-purpose multicore X86-based processor Intel is developing with high-performance graphics capabilities built in.
Intel refers to this forward compatibility as future-scalable-that is, applications built with Parallel Studio will work with processors that have more cores than today's processors have, and will take advantage of the additional cores.
Searching for Code Problems
During tests, the Parallel Inspector tool helped me spot some of the most common problems in parallel programming-particularly deadlocks and data races.
The tool runs your program and monitors it, looking for these problems-as opposed to simply inspecting the code itself. While your program is being analyzed, it takes much longer to run. My test case took more than 10 times the amount of time to run, but the payoff was a comprehensive list of the errors found, including data races, in the form of a to-do list. I could then click on the errors and go right to the source code line that produced the problem.
Although the Inspector finds errors as a program is running and can show you where in your source code the problems occurred, it only gives hints on fixing them. Ultimately it is up to you, as a good software engineer, to understand your code enough to recognize the problems the Inspector found and to fix them correctly.
In the sample case I tried, the Inspector discovered that multiple threads were trying to write to the same memory location simultaneously, which suggests I needed a critical section. Creating a critical section was easy. The Intel C++ compiler that's provided as part of the Parallel Composer component fully supports the OpenMP standard, which is a C++ extension that allows you to use pragmas in your code to specify multithreaded features such as critical sections.
That simplifies your job: Instead of calling into the operating system to create a critical section, you just throw in a pragma (like so: #pragma omp critical) before the line that is to be a critical section.
In addition to the use of directives such as the pragmas, the Intel C++ compiler also includes unique language extensions that you can use, such as this:
__par for (i = 0; i < size; i++)
Additionally, the compiler comes with a threading library called the Intel IPP (Integrated Performance Primitives) and a template-based library called the Intel TBB (Threading Building Blocks). All of these are powerful approaches to writing parallel programs that make use of multicore processors. And, if you do your job right, the code created by the Intel compiler will make use of all the cores in the processor (including non-Intel processors).
Optimizing Apps for Multicore Use
After you've used Parallel Inspector to identify problems, you can then fine-tune your application and verify that it's making optimal use of the processor cores. This is where the Parallel Amplifier comes in. Amplifier will again analyze your running program and verify that the program runs optimally, taking advantage of the cores. Of course, you're limited to the number of cores on the machine you're testing on; if you have a dual-core processor, you won't be able to test how your software will perform on a machine with a quad-core processor.
From a user standpoint, I can think of many applications that I use daily that could benefit from being redesigned under Intel Parallel Studio. Think how many times you look at your Task Manager and see a program that is hogging one core at 100 percent while not making any use of the other core. If coded properly, the programs could use part of each core and run faster, leaving plenty of room for other programs to run without slowing your machine down.
A fully functional, 30-day trial version of Parallel Studio can be downloaded from http://software.intel.com/sites/products/irc/ipsdownload.html?Sequence=984485.
The full version of Parallel Studio costs $799, not including Parallel Advisor Lite. Parallel Composer, Parallel Inspector and Parallel Amplifier can also be purchased separately for $399 each. Academic pricing is available, with the full Parallel Studio costing $199.
Jeff Cogswell can be reached at firstname.lastname@example.org.