SHARE

Inside Intel Larrabee

Written By

Aug 6, 2008

3 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Inside Intel Larrabee
Inside Intel Larrabee – CPU-GPU Convergence
Inside Intel Larrabee – Larrabee Key Differences from Typical GPUs
Inside Intel Larrabee – Larrabee Processor Block Diagram
Inside Intel Larrabee – Larrabee x86 Chip Block Diagram
Inside Intel Larrabee – Larrabee Vector Unit Block Diagram
Inside Intel Larrabee – More Cores = More Scalability
Inside Intel Larrabee – Transparency Example with Sorting
Inside Intel Larrabee – Transparency Example with Fog
Inside Intel Larrabee – Shadows Using Irregular Z-Buffer
Inside Intel Larrabee – Shadow Map vs. Irregular Z-Buffer
Inside Intel Larrabee – See More Slide Shows Like This One

Inside Intel Larrabee

Inside Intel Larrabee – CPU-GPU Convergence

Intel believes Larrabee is the incarnation of the merge between CPU and GPU (Graphical Processing Unit) performance. Pushed to the edge of performance, CPUs are adding cores and computing power. Motivated by higher-quality graphics and data-parallel programming, GPUs are being tapped to do more general computing tasks.

Inside Intel Larrabee – Larrabee Key Differences from Typical GPUs

Each Larrabee core is a complete x86 core: context switching and pre-emptive multitasking; virtual memory and page swapping; fully coherent caches at all levels of the hierarchy. Efficient interblock communication: ring bus for full interprocessor communication; low-latency, high-bandwidth Level 1 and Level 2 caches; fast synchronization between cores and caches. Fixed function logic doesn’t get in the way: no back-end blender between cores and memory; no rasterization logic between vertex and pixel stages. Result: flexible load balancing and general functionality.

Inside Intel Larrabee – Larrabee Processor Block Diagram

Cores communicate on a wide ring bus, which means fast access to memory and fixed function blocks; fast access for cache coherency. L2 cache is partitioned among the cores, which provides high aggregate bandwidth and allows data replication and sharing.

Inside Intel Larrabee – Larrabee x86 Chip Block Diagram

Separate scalar and vector units with separate registers; in-order x86 scalar core; vector unit: 16 32-bit ops/clock; short execution pipelines; fast access from L1 cache; direct connection to each core’s subset of the L2 cache; prefetch instructions load L1 and L2 caches.

Inside Intel Larrabee – Larrabee Vector Unit Block Diagram

Vector complete instruction set: scatter/gather for vector load/store; mask registers select lanes to write, which allows data-parallel flow control; this enables mapping a separate execution kernel to each VPU lane. Vector instructions support: fast read from L1 cache; numeric type conversion and data replication while reading from memory; rearrange the lanes on register read; fused multiply add (three arguments); Int32, Float32 and Float64 data.

Inside Intel Larrabee – More Cores = More Scalability

Intel claims it sees greater game performance on games “F.E.A.R.,” “Half-Life 2: Episode Two” and “Gears of War” as it adds more cores to Larrabee.

Inside Intel Larrabee – Transparency Example with Sorting

Sorting allows artists to distinguish details on foreground and background images. The small dragon was drawn first, so its wing appears behind the large dragon. The rendered pixels are sorted to show the larger dragon’s wing correctly.

Inside Intel Larrabee – Transparency Example with Fog

The use of multiple layers allows artists to correctly render detail even in the background. Without multiple layers, the fog is not accurately applied to the small dragon seen through the wing. With multiple layers, the fog affects the small dragon correctly, even when seen through the wing.