Intel Details Larrabee Processor Architecture

Intel describes aspects of its Larrabee microarchitecture, including the design of an x86 processing core developed specifically for the chip. The chip maker explains why its engineers believe the Larrabee processor will usher in a new era of parallel software programming.

Intel is offering the first in-depth look at its "Larrabee" processor and the chip maker plans to offer the microprocessor to address a range of graphics and visual applications using x86 processing cores instead of more traditional GPUs.

In a paper, "Larrabee: A Many-Core x86 Architecture for Visual Computing," Intel engineers offered several new details about the forthcoming Larrabee graphics processing unit, including the fact that Intel derived the instructional pipeline for the individual x86 cores from the company's Pentium chip.

In addition, Larrabee will support Microsoft's DirectX and OpenGL APIs, which Intel hopes will motivate a legion of software developers to create new visual- and graphics-intensive applications while taking advantage of the traditional Intel Architecture found in Larrabee's x86 cores.

The first of the Larrabee chips, which are destined for the high-end PCs that use discrete graphics cards, will not arrive until 2009 or 2010, although Intel is expected to release samples starting in late 2008. Larrabee is described as a "many-core" processor, which means that it's likely to contain 10 or more individual x86 CPU cores within the silicon package. (Intel's upcoming Nehalem processors are likely to have up to eight cores.)

While Intel engineers have spoken about Larrabee and its place within high-performance computing, the paper makes clear that the first of the Larrabee processors are designed for the gaming market, where the chip will compete against high-end GPU offerings from ATI-owned by Advanced Micro Devices-and Nvidia. The fact that Intel is supporting the industry-standard DirectX and OpenGL APIs shows that the chip maker is looking to encourage developers to create new gaming applications on its architecture.

Intel is also betting that Larrabee will usher in a new era of parallel computing by offering developers a way to create highly specialized applications, such as games that require visual computing or scientific software applications that require intensive graphics capabilities, using the familiar x86 instructional set along with the C and C++ programming languages.

Nvidia, with its Tesla 10 series GPGPU (general processing GPU), is requiring developers to learn a new programming language called CUDA (Compute Unified Device Architecture), which allows the GPU to be programmed like a CPU.

For its part, AMD and its ATI graphics division are embracing CL, an open-source programming language. AMD is also moving toward combining the CPU and GPU on the same piece of silicon as part of its Accelerated Computing program.

In short, Intel is looking to combine the throughput capabilities of a CPU with the parallel programming abilities found in graphics processors.

"What the graphics and general data parallel application market needs is an architecture that provides the full programming abilities of a CPU, the full capabilities of a CPU together with the parallelism that is inherent in graphics processors," said Larry Seiler, a senior principal engineer with Intel. "Larrabee provides [that] and it's a practical solution to the limitations of current graphics processors."

This development could lead to a new way of looking at the capabilities of CPUs and GPUs in the commercial market.

"What stands out is that Intel views the CPU as the best GPU," said John Spooner, an analyst with Technology Business Research.

"Intel is able to apply x86 to rendering graphics rather than adopting a new or different architecture, which is clearly directly opposite of Nvidia's view of the world," Spooner added. "These companies are sure to engage in a public jousting match over whose architecture is better. The one that comes out on top, though, will be determined by performance and how well accepted the architecture is by developers."

At the heart of Larrabee is a series of simple x86 cores that are built with short instructional pipelines derived from the Pentium chip. The chip will also include what Intel describes as a vector processing units, which enhance the performance of graphics and video applications.

The Larrabee architecture will support four execution threads with each core and each thread supporting a register set, which helps with memory. In this setup, Larrabee offers a simple, efficient in-order instructional pipeline but maintains some of the benefits of an out-of-order pipeline, which helps when running applications designed to run in parallel. The short pipelines on Larrabee will allow for faster access to the Level 1 cache with each core.

All the Larrabee x86 cores-at this point Intel gave no guidance as to how many cores Larrabee will use-will share part of a large L2 cache, which will be partitioned among the different cores and allow for high bandwidth and data sharing.

The entire Larrabee chip architecture will be built on what Intel called a "bidirectional ring network," which should also allow faster communication between each of the individual x86 cores.

Intel will present the entire technical paper at the SIGGRAPH conference in Los Angeles on Aug 12.